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SUMMARY 

There  are  many  important  example  of  v^-consistently  estimable  functionals  that  are 
interesting  in  econometrics,   such  as  average  derivatives  and  nonparametric  consumer 
surplus.      Corresponding  estimators  may  require  undersmoothing  to  achieve  v^-consistency, 
due  to  first  order  bias  in  the  expected  influence  function.      We  give  a  general  bias 
correction  that  can  be  added  to  a  plug-in  estimator  to  remove  the  need  for  undersmoothing 
and  improve  its  higher  order  properties.      We  also  describe  a  bootstrap  smoothing 
correction  for  the  nonparametric  estimator  that  achieves  analogous  results  for  the 
plug-in  estimator  and  show  that  idempotent  transformations  of  the  empirical  distribution 
need  not  require  undersmoothing  for  -/n-consistency.     We  find  that  this  bias  correction 
can  lead  to  large  efficiency  improvements  and  lower  sensitivity  to  bandwidth  choice. 
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Key  words  and  phrases.     Functional  estimation,  nonparametric  estimation,  v'n-consistency, 
undersmoothing,   influence  function,   bias  correction. 


1.         Introduction 

Functionals  of  nonparametric  estimators  that  can  be  /n-consistent  have  important 
applications  in  econometrics,   including  average  derivatives  and  average  consumer  surplus. 
Most  functional  estimators  are  based  on  nonparametric  estimation  and  many  require 
"undersmoothing"  of  the  nonparametric  estimator  to  achieve  /n-consistency,   meaning  the 
bias  of  the  nonparametric  estimator  shrinks  faster  than  its  variance.     In  this  paper  we 
show  that  this  requirement  can  be  removed,   by  either  adding  a  bias  correction  term  to  the 
functional  estimator  or  using  a  smoothing  correction  for  the  nonparametric  estimator, 
leading  to  v^-consistent  functional  estimation  without  undersmoothing.     These 
modifications  also  lead  to  functional  estimators  with  improved  higher-order  efficiency, 
that  may  attain  ^-consistency  when  others  do  not. 

The  source  of  this  improvement  is  a  reduction  in  the  bias  of  the  functional 
estimator.     Many  previous  functional  estimators  have  a  bias  that  is  the  same  order  as  the 
bias  of  the  density  estimator.      In  contrast,   the  estimators  we  develop  have  a  bias  that 
is  the  same  order  as  the  product  of  the  bias  of  the  nonparametric  estimator  with  another 
bias  term.     The  other  bias  term  is  that  for  the  influence  function  of  the  estimator, 
which  is  the  mean-square  derivative  of  the  functional.     We  refer  to  the  corresponding 
reduction  in  bias  order  as  a  bias  complementarity,  with  the  influence  function  bias  term 
complementing  the  nonparametric  bias  term. 

The  properties  of  the  influence  function  play  a  key  role  in  our  analysis.     When  the 
influence  function  is  smooth,   in  certain  ways  to  be  made  precise  below,  the  influence 
function  bias  term  will  be  small  enough  that  the  need  for  undersmoothing  will  be  removed. 
For  some  of  the  estimators  we  consider,  the  influence  function  will  be  required  to  be 
smooth  as  a  function  of  the  density.     This  is  a  regularity  condition  that  is  often 
satisfied.     For  other  estimators  the  influence  function  will  be  required  to  be  smooth  as 
a  function  of  the  data,   a  condition  that  does  not  hold  for  some  functionals. 

We  consider  two  approaches  to  bias  corrected  functional  estimation.     Our  first 
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approach  is  to  add  to  the  functional  estimator  a  bias  correction,   consisting  of  the 
difference  of  integrals  of  an  influence  function  estimator  over  the  empirical 
distribution  and  the  nonparametric  estimate.     This  additional  term  does  not  affect  the 
asymptotic  distribution  of  the  estimator  in  the  Vn'-consistent  case  but  it  does .  lead  to 
improved  higher  order  properties.      Our  second  approach  is  to  do  a  bootstrap  smoothing 
correction  to  the  nonparametric  estimator  before  using  it  in  functional  estimation.     This 
correction  consists  of  replacing  a  nonparametric  distribution  estimator     F     by     F  =  F  - 
(G  -  F)  =  2F  -  G,     where     G     is  an  estimator  obtained  from     F     by  the  same  transformation 
used  to  obtain     F     from  the  empirical  distribution.     We  show  that  using  this  estimator 
removes  the  need  for  undersmoothing  when  the  influence  function  is  smooth  enough.      For 
kernel  estimators  we  show  that  this  smoothing  correction  leads  to  a  particular  kind  of 
kernel,   the  "twicing"  kernel  described  below.      We  also  show  that  undersmoothing  will  not 
be  needed  for  ■/n-consistency  when     F     is  an  idempotent  transformation  of  the  empirical 
distribution,   so  that     G  =  F     and     F  =  F,     and  the  bootstrap  correction  is  built  into  the 
original  estimator.     This   idempotent  estimator  class  includes  orthogonal  series  density 
estimators,   series  estimators  of  conditional  expectations,   and  sieve  estimators,   and  thus 
provides  an  explanation  for  the  lack  of  an  undersmoothing  requirement  for  these 
estimators. 

We  emphasize  that  the  bias  reduction  we  consider  does  not  come  from  reducing  the 
bias  of  the  density  estimator.      Such  higher  order  bias  reductions  (e.g.   via  higher-order 
kernels)  do  not  remove  the  requirement  that  bias  shrinks  faster  than  variance  for  the 
nonparametric  estimator.     That  requirement  can  only  be  removed  if  the  bias  of  the 
functional  estimator  is  smaller  order  than  the  bias  of  the  nonparametric  estimator.     This 
reduction  in  bias  is  brought  about  by  the  bias  complementarity  we  will  discuss. 

The  bias  reduction  may  be  accompanied  by  some  increase  in  the  variance  of  the 
functional  estimator.      Although  the  variance  still  shrinks  at  the  same  rate,   the  size  of 
the  variance  will  be  larger.      In  large  samples  the  reduction  in  bias  will  allow 
adjustment  of  the  smoothing  parameters  so  that  the  variance  is  smaller,   although  the  bias 
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reduction  could  increase  the  small  sample  variance  of  the  estimator.     Bias  reductions  for 
nonparametric  estimators   (like  higher  order  kernels)  have  similar  properties,   except  that 
they  depend  on  higher  order  properties  of  the  nonparametric  estimator  whereas  our 
functional  bias  reduction  depends  on  the  properties  of  the  influence  function.     We  find 
that  in  a  simple  example  the  bias  correction  can  give  a  large  efficiency  gain  and  reduce 
sensitivity  to  the  choice  of  bandwidth. 

The  type  of  estimator  we  consider  has  antecedents  in  the  literature.     Bickel  and 
Ritov  (1988)  developed  an  estimator  for  the  average  density  that  is  v^-consistent  under 
minimal  smoothness  conditions.      This  estimator  has  a  similar  form  to  the  bias  corrected 
functional  discussed  here,   as  do  estimators  in  Pastuchova  and  Hasminskii   (1989).      Our 
contribution  is  to  give  a  general  version  of  this  type  of  estimator  and  show  the 
improvement  in  its  properties.      Also,   we  show  how  bootstrap  corrected  nonparametric 
estimators  remove  the  need  for  undersmoothing,   and  hence  that  undersmoothing  is  not 
needed  for  idempotent  nonparametric  estimators. 

Section  Two  of  the  paper  describes  the  general  form  of  the  bias  corrected  functional 
we  consider,   and  gives  results  for  kernel  estimators.      Section  Three  describes  a 
nonparametric  estimator  with  a  bootstrap  correction  for  smoothing  and  its  use  for 
functional  estimation.      Section  Four  considers  a  special  class  of  linear  estimators  where 
mean-square  error  calculations  are  feasible,   showing  more  precisely  the  higher  order 
effect  of  the  bias  correction,   and  giving  exact  MSE  calculations  for  one  case.      Section 
Five  analyzes  semiparametric  m-estimators,   shows  that  undersmoothing  is  not  needed  when 
the  nonparametric  estimator  has  no  effect  on  the  limiting  distribution,   and  derives 
results  for  kernel  estimation  with  a  smoothing  correction.     Section  Six  concludes. 
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2.        Bias  Corrected  Functional  Estimation 

To  focus  on  the  essential  features  of  the  problem  at  hand  it   is  useful  to  begin  our 
discussion  with  a  linear  functional     fi(F)  =  J5(2)F(dz)     where     6{z)     is  known  and     F 
denotes  some  unsigned  measure  (charge)  on     z.     Although  this  case  is  relatively  simple, 
the  role  of  undersmoothing  in  achieving  v^-consistency  is  easily  understood  here.     Let     F 
be  some  nonparametric  estimator  of  the  true  distribution     F   .     For  example,     F     could 
correspond  to  a  nonparametric  density  estimator     f,     with     F(z)  =  Jl(u:£z)f(u)du     and 
/j(F)  =  J5(z)f(z)dz.      Consider  the  estimator     fi  =  ;i(F)  =  j5(z)F(dz)     and  let     F(z)  = 
E[F(z)].      Assuming  that  the  order  of  integration  can  be  interchanged,   the  bias  of  this 
estimator  is 


(2.1)  Kill]  -  Mq  =  J'5(z)F(dz)  -  J'5(z)FQ(dz)   =  SS{z){F-F^){dz). 


If  the  order  of  this  bias  is  the  same  as  the  order  of  the  pointwise  bias  of  the 
nonparametric  density  estimator,   as  occurs  in  many  cases,   then  Vn'-consistency  will 
involve  the  pointwise  bias  shrinking  faster  than     l/-/n.     Furthermore,   the  pointwise 
standard  deviation  of  a  nonparametric  density  estimator  generally  shrinks  no  faster  than 
l/'/n,     so  that  /n-consistency  will  involve  the  pointwise  bias  shrinking  faster  than  the 
pointwise  standard  deviation.     This  property  is  referred  to  as  undersmoothing,  since  less 
bias  is  generally  associated  with  less  smoothing  and  the  fastest  shrinkage  of  mean  square 
error  is  generally  associated  with  bias  and  standard  deviation  shrinking  at  the  same 
rate. 

A  way  to  reduce  the  bias  of     fi     is  to  form  an  estimator  of  the  bias  term  in  equation 

(2.1)  and  subtract  it  off.     In  this  simplest  case  there  is  an  unbiased  estimator 
J'5(z)F(dz)  -  5]._  5(z.)/n     of  the  bias  term,   and  subtracting  it  off  gives 

(2.2)  M  =  M  +  I.",6(z.)/n  -  j5(z)F(dz)  =  J].",5(z.)/n, 

^^1=1       1  ^1=1       1 
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the  usual  unbiased  estimator. 

For  nonlinear  functionals  the  same  basic  idea  can  be  applied  to  a  first-order  bias 
term.     To  describe  this  approach,   suppose  that     ju(F)     has  some  restricted  domain     S^     to 
which  both     F(z)     and     Fp,(z)     belong.     For  example,     S^     might  be  restricted  to  have 
elements  that  are  absolutely  continuous  with  respect  to  Lebesgue  measure,   with  a  density 
f     corresponding  to  each     F  e  ?.     Also  suppose  that  there  is  an  expansion  of     /i(F)     on     5^ 
such  that 


(2.3)  m(F)  =  )li(Fq)  +  j5(z,FQ)(F-FQ)(dz)  +  R(F-Fq.Fq),      |R(F-Fq,Fq)|    =  odlF-F^II), 


where      IIFll      denotes  a  function  semi-norm.     Let     P     denote  the  empirical  distribution, 
5(z)  =  5(z,F    ),      and     i/((z)  =  5(z)-E[5(z)].     Then  the  plug-in  estimator     fi  =  ^(F) 
satisfies 

(2.4)  Vniii  -  tij  =  y.",i//(z.)/'/n  +  R     +  B  ,     R     =  \/nR(F-F^,F^),  B     =  ■/Kj5(z)(F-P)(dz). 

0         ^1=1       1  n  n         n  0     0        n 


The  first  order  bias  of  this  estimator  will  be     E[B   ]  =  /nJ"5(z)(F-F„)(dz),     with 

n  0 

R(F-F   ,F   )     including  higher  order  terms.     A  bias  correction  can  be  formed  by  subtracting 

an  estimate  of  this  first  order  term.      If     5(z)     were  known,   an  unbiased  estimate  of  this 

first  order  term  would  be     /5{z)F(dz)  -  y.    ,5(z.)/n.     A  feasible  version  of  this  bias 

^1=1       1 

estimate  can  be  formed  by  replacing     6(z)     with  an  estimator     6(z).     Then  subtracting  the 
corresponding  bias  estimator  gives 

(2.5)  M  =  A  +  Ei^^Zi^/n  -  j5(z)F{dz)   =  A  +  J'5(z)(P-F)(dz). 

Unlike  the  linear  functional  case,  this  estimator  will  not  be  exactly  unbiased,  because 
of  the  higher  order  term     R(F-F  ,F   )     and  the  estimation  of     5(z).     Nevertheless, 
because  we  have  subtracted  an  estimator  of  the  first-order  bias  of     fi,     this  estimator 
should  have  smaller  bias  than     ji.     In  fact,  as  discussed  below,   it  has  an  asymptotic 
expansion  that  is  the  same  as     ji     except  one  of  the  remainder  terms  is  of  smaller  order. 
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A  simple  example  is  the  average  density.      Suppose  tha.     3-     is  restricted  to  contain 

only  absolutely  continuous  elements  with  square  integrable  densities  and  let     fi(F)  = 

2  " 

J'f(2)  dz.     Let     f     and     f        denote  the  densities  corresponding  to     F     and     F   .      Note  that 

m(F)  =  jj(F    )  +  SZT^{z)ir{z)-r^{z)]dz  +  J[f(z)-fQ(2)]^dz.     so  that  equation  (2.3)   is 

1/2  "  «  ^ 

satisfied  with     6(z,F   )   =  2fQ(z),     and     IIFII   =  fi(F)       .     Letting     5(z)  =  6(z,F)  =  2f(z), 

a  bias  corrected  estimator  is  given  by 

(2.6)  M  =  Shz)'^dz  +  J-2f(z)(P-F)(dz)  =  2j].y(z.)/n  -  J-f(z)^dz. 


In  this  example  the  bias  corrected  estimator  is  a  linear  combination  of  two  well  known 

2 
estimators,   the  plug-in  estimator     J'f(z)  dz     and  the  average  estimated  density 

I.",f(z.)/n. 
^1=1       1 

To  see  the  effect  of  the  bias  correction  in  the  general  case  we  can  compare 
remainder  terms.     We  have 

(2.7)  V^{^  -  Mn)  =  I.",<A(z.)/v^  +  R     +  D  ,     D     =  v^J[5(z)-5(z)](F-P)(dz). 

0         ^1=1       1  n  n         n 


The  only  difference  between     ii     and     ji     is  that  the  remainder  term     B       has  been  replaced 

by     D  .     The  remainder  term     D       should  be  of  smaller  order  than     B       under  appropriate 
•^        n  n  n  *^^     ^  . 

regularity  conditions,  because  the  fixed  term     5{z)     in     B       has  been  replaced  by 
6(z)-5(z)     that  is  shrinking  when     6(z)     is  consistent.     Thus,     /Lt     has  the  same  expansion 
as     fi     except  that  one  remainder  term  is  smaller,   and  in  this  sense  improves  upon     /i.     In 
particular,  the  remainder  term     D       is  second-order,   in  contrast  with  the  first-order 
remainder     B  ,     and  so  may  have  smaller  order  than  the  pointwise  bias  of     f. 
Consequently,  undersmoothing  may  not  be  needed  for  Vn-consistency  of     fi. 

The  form  of     ju     given  in  equation  (2.5)  is  like  an  efficient  estimator   that  is 
obtained  in  one  step  from     \i, 

fl  =  M  +  l^^k-z-Vn,     0(z)   =  6(z)  -  j5(u)F(du). 
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In  the  semiparametric  efficiency  literature  this  procedure  is  used  to  improve  asymptotic 

efficiency  (e.g.   see  Bickel  et.   al.,   1990),   in  the  sense  of  lowering  the  asymptotic 

variance.     Here  it  does  not  affect  the  asymptotic  variance  of  the  estimator  when     R   , 

n 

B  ,     and     D       all  converge  in  probability  to  zero.     Instead,   it  lowers  the  size  of 
n  n 

one  of  the  remainder  terms,   speeding  up  the  convergence  rate  of  the  estimator  when     B 
dominates,   improving  efficiency  in  this  sense.     Although  it  is  difficult  to  derive 
exact  convergence  rates,   because  of  the  nonlinearity  of     D       and     R   ,     we  can  compare 
conditions  for  V?i-consistency,   which  we  do  in  this  Section.     This  comparison  is  one 
important  aspect  of  the  relative  convergence  rates  for     ii     and     fi.      In  Section  Four  we 
consider  a  different  class  of  estimators  where  it  is  possible  to  compare  convergence 
rates  for  the  original  and  bias  corrected  estimators. 

To  explain  this  estimator  and  its  properties  it  is  helpful  to  discuss  the  role  of 
the  expansion  in  equation  (2.3),   how  to  form     5(z),     and  to  compare  conditions  for 
V^-consistency  of     /J     and     /j. 

2.1      The  Functional  Expansion 

The  formation  of  this  bias  corrected  estimator,   and  its  properties,   depend  on  the 
expansion  of  equation  (2.3).      Equation  (2.3)  actually  embodies  two  conditions;      i)     ^i(F) 
is  Frechet  differentiable  with  respect  to     IIFII;     ii)  there  is  an  integral  representation 
for  the  derivative.     Although  it  is  well  known  that  Frechet  differentiability  does  not 
generally  hold  over  a  domain  that  includes  empirical  distributions,   our  allowance  of  a 
restricted  domain  and  for  a  choice  of  semi-norm  makes  this  hypothesis  quite  general. 

Many  functionals  will  have  Frechet  derivatives  when     ^     and     II  •  II     are  appropriately 

2 

specified.     For  example,   we  noted  above  that     fi(F)  =  J"f(z)  dz     is  Frechet  differentiable 

for     ?     and     IIFII     as  previously  specified. 

The  other  condition  embodied  in  equation  (2.3),  the  integral  representation  for  the 
linear  term,   limits  the  scope  of  our  results  to  functionals  that  satisfy  the  necessary 
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conditions  for  v'n-consistent  estimability.     To  see  why,   consider  any  parametric  family 
{F   }     passing  tiirough     F       at     ^r  =  0     that  is  regular  in  the  sense  of  Bickel  et  al 

Q  U 

-1/2  1/2 

(1990),     with  score     S(z)  =  2f-(z)         5f   (z)       /Sj-     (where     f   (z)     is  a  density  for 

0  J'  9' 

F   (z)),     J'6(z,F^)^F   (dz)     is  bounded  as  a  function  of     n,     and     IIF  -F^ll   =  0(11^-11)).     Then 
r  0       r  y     0  ° 

it  follows  by  Bickel  et.   al.    (1990)  that     aX5(z,F^)F   (dz)/a9'  =  E[5(z,F^)S(z)],     so  that 

O     3^  0 

by  equation  (2.3),     fi(F   )     is  differentiable  and 

IS 

(2.8)  afi(F    )/d'i  =  E[5(z,F^)S(z)], 

y  0 


where  the  derivatives  are  evaluated  at  zero.     As  shown  by  Van  der  Vaart  (1991), 
satisfaction  of  this  equation  for  all  regular  parametric  families  is  necessary  for 
existence  of  a  (regular;   see  Van  der  Vaart,   1991)  v^-consistent  estimator  of     fi(F    ). 
Additional  regularity  conditions  are  often  needed  to  attain  Vn'-consistency  (i.e.   equation 
(2.8)   is  only  a  necessary  condition)  as  discussed  in  Bickel  and  Ritov  (1988).      Often 
these  conditions  come  in  the  form  of  smoothness  conditions  for     F   . 

The     5(z,F   )     term  is  referred  to  as  the  influence  function,   motivated  by  the 
expansion  of  equation  (2.7)  where     6(z.,F   )     gives  the  first-order  effect,   or 
"influence,"  of  an  observation  on  the  estimator     /i.     Also,   we  can  think  of     6(z,F   )     as 
the  first-order  effect  on     fi(F)     of  changing     F,     with     /^(z.F   )(F-F   )(dz)     in  equation 
(2.3)  being  analogous  to  the  differential  in  multivariate  calculus,  and  the  influence 
function     6(z,F   )     to  the  gradient. 

Estimation  of  the  influence  function  is  important  for  the  bias  correction,  so  it  is 
useful  to  have  a  way  to  calculate  it  for  a  given  functional.     One  way  is  to  look  for 
6(z,F-)     that  solves  equation  (2.8).     Often,  the  solution  to  this  equation  can  be 

determined  by  manipulating     5/i(F   )/dz     using  properties  of  derivatives  so  that  it  has  the 

tf 

expected  product  form  in  equation  (2.8),   and  recovering     6(z,F   )     by  inspection.     For 

2 
example,  for     fi(F)  =  Jf(z)  dz,     differentiating  under  the  integral  gives     5fi(F  )/5y  = 

0 

5jf   (z)^dz/53-  =  J2f„(z)[af   (z)/arldz  =  E[2f„(z)ainf  {z)/d-<f]  =  E[2f„(z)S(z)],     so     6(z,F^) 
3"  O  ■y  0  y  0  0 
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=  2f   (z)     satisfies  equation  (2.8),      (a  known  result). 


2.2     Implementing  the  Bias  Correction 


Implementing  this  bias  correction  in  practice  requires  an  estimator     6(z)     of  the 
influence  function.     One  general  approach  is  to  find  the  formula     5(z,F    )     and  then 
plug  in  the  estimator     F     to  form     6(z)  =  5(z,F).     This  was  the  approach  followed  to 
obtain  the  bias  corrected  estimator  of  the  integrated  squared  density  in  equation  (2.6). 
If  the  formula     6(z,F    )     is  very  complicated  (e.g.   as  in  Hausman  and  Newey,    1995),   it 
might  be  more  feasible  to  form     6(z)     by  another  method. 

A  method  of  forming     5(z),     that  bypasses  an  explicit  formula  for     5(z,F),      is 

available  for  linear  density  estimators,   where     F(z)  =  J'l(u:£z)f(u)du     and     f(z)  = 

y.    ,K(z,z.)/n.      For  example,   kernel  estimators  have  this  form  for     k(z,z.)  = 
^^1=1  1  1 

— r 
h     K((z-z.)/h),     where     h     is  a  bandwidth  parameter  with  dependence  on  sample  size 

suppressed  for  notational  convenience,     r     is  the  dimension  of     z,     and  K(u)     is  a  kernel 

function  satisfying     J'K(u)du  =  1     and  other  properties.     In  this  case  a  simple  derivative 

calculation  can  be  used  for  the  bias  correction.     Let     A  (u)  =  J"l(t£u)K:(t,z)dt,   so  that 

z 

F(*)  =  E-_i'^    (O/n.      Let     a     denote  a  scalar.      Suppose  that  equation  (2.3)  holds 
i 

uniformly  in     F     and     F     and  replace     F     by     F  +  aA       and     F       by     F.     Dividing 
through  by     a,      and  assuming     HA   II      is  finite  with  probability  one,   as     a  — >  0, 

[/i(F  +  aA  )  -  ^l{F)]/a  =  J'6(u,F)K(u,z)du  +  R(aA  ,F)/a 
z  z 

|R(aA  ,F)/a|    =  o(allA   ll)/a  =  o(a)/a  =  o(l). 
z  z 

Therefore,  at     a  =  0,     5fx(F  +  aA  )/da  =  J'5(u,F)K(u,z)du.     For  many  choices  of     »c(u,z), 
5(z)  =  J"5(u,F)K(u,z)du     should  be  close  to     6(z,F)     in  large  samples,  and  so     5(z) 
could  be  used  to  estimate  the  influence  function.     Then  inserting  this  estimator  in 
equation  (2.5)  and  interchanging  the  order  of  differentiation  and  integration  gives 


-  9  - 


il  =  {1  +  aj/LitF  +  aA   ){P-F)(dz)/aa|      ^. 


Thus,   a  bias  corrected  estimator  can  be  calculated  by  differentiating  the  difference  of 

the  average  and  integrated  functionals  with  respect  to  a  small  increment     A       in     F. 

This  derivative  could  even  be  calculated  numerically. 

The  influence  function  estimator     5(z)  =  5fi(F  +  ccA   ]/da     was  developed  in  Newey 

(1994)  for  kernel  estimators  and  applied  in  Hausman  and  Newey  (1995)  to  solutions  to 

differential  equations.      It  generalizes  the  delta-method  estimator  of  a  function  of 

sample  means,   in  the  sense  that  if     F     is  an  unknown  constant   (rather  than  a  function) 

and     A  (u)     did  not  depend  on     u,     so  that     F  =  Y.--A    ^^i     is  a  sample  mean,   then     6(z) 

i 

[afi(F)/aF]'A  . 


2.3     \/n-Consistency  and  Undersmoothing 


Sufficient  conditions  for  V^-consistency  of  each  estimator  are  that  the 

corresponding  remainder  terms  in  equation  (2.7)  are     o  (1).     We  will  consider  each  of 

these  remainder  terms  in  sequence,   beginning  with     R   .     The  following  condition  allows  us 

to  bound  the  size  of     R  . 

n 

Assumption  1:     There  is  a  set  of  functions     S-     and  a  constant     C     such  that     F   ,   F  e  5^ 
with  probability  approaching  one  and  for  all     F  e  S', 

/i(F)  =  m(Fq)  +  J'6(z,FQ)(F-FQ)(dz)  +  R(F-Fq,F),      |R(F-Fq,Fq)|    ^  CIIF-FqII^. 


This  condition  formalizes  equation  (2.3)  plus  imposes  a  requirement  that  the  size  of  the 

2 

remainder  be     IIF-F   II   ,     which  will  hold  when     /i(F)     is  twice  Frechet  differentiable  and 

F     is  close  enough  to     F        (e.g.   see  Proposition  7.3.3  of  Luenberger,   1969).     Under  this 

-1/4 
condition     R     =  o  (1)     will  follow  from     IIF-F„II   =  o  (n         ),     i.e.     F     being  more  than 
n  p  Op 

1/4 
n      -consistent.     This  condition  helps  ensure  that  the  nonlinearity  remainder  term 


-  10  - 


R(F-F   ,F    )     is  second  order. 

Consider  next     B     =  j5(z)(F-P)(dz).     As  previously  discussed,     E[B   ]   = 
n  n 

■/nJ"5(z)(F-F    )(dz)  —4  0     may  require  that  the  bias  of     F     shrink  faster  than     IZ-Zn.      On 

the  other  hand,   the  variance  of     B       should  go  to  zero  under  weak  conditions.      For 

2 
instance,   for  a  linear  density  estimator,   we  have     Var(B   )  £  E[{J"5(u)K(u,z)du  -  5(z)>   ], 

which  will  go  to  zero  when     J5(u)K(u,z)du     converges  in  mean-square  to     5(z). 

The  remainder  term     D       of  equation  (2.7)   is  more  complicated.      It  is  helpful  to 

decompose  it  as 

D     =S     +t,     S     =1^  {5)-v   (5),     V  (d)   =  SdU){P-F){dz)/V^, 
nnnnnn  n  0 

t     =  -yHj[5(z)-5(z)](F-F^)(dz). 
n  0 


The  order  of     S       depends  on  the  order  of     5(z)-6(z)     and  the  modulus  of  continuity  of 

the  empirical  process     v   (d).      Precise  conditions  for     S     =  o   (1)     are  available  in  the 

n  n  p 

literature  on  empirical  processes,   e.g.   see  Van  der  Vaart  and  Wellner  (1996).      Also,   it 

may  be  possible  to  use  the  structure  of     6(z)     to  show     S     =  o  (1)     directly,   as  in  the 

kernel  estimator  results  of  Newey  and  McFadden  (1994).     Generally     S     =  o  (1)     will  not 

n         p 

require  undersmoothing,   because     S       is  a  second-order  term.      Also,     T       is  a 
second-order  term,   being     Vri     times  the  product  of  the  remainder  for     6     and     F,     and  so 
should  not  require  undersmoothing  to  be     o  (1). 

Combining  this  analysis  for  the  various  remainder  terms  leads  to  conditions  for 

-  -1/4 

vn-consistency  of  the  estimators.     If     F     converges  faster  than     n  in  the  norm     II  •  II 

and     VnE[B   ]  — >  0     then     ix     should  be  /n-consistent.     When  the  order  of     E[B   ]     is 

n  n 

the  same  as  the  order  of  the  pointwise  bias,  these  conditions  will  require 

undersmoothing.     In  contrast,     /i     will  be  v^-consistent  without  undersmoothing  if  both     S 

-1/4 
and     F     converge  faster  than     n  and     S     =  o  (1). 

n         p 

To  obtain  more  precise  results  we  consider  kernel  estimators,  where     F     has  a 

density     f(z)  =  Y.   ,K,  (z-z.)/n     for     K,  (u)  =  K(u/h)/h  .     We  also  restrict  the  domain  of 
1=1   n        1  h 
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IliF)     to     F     that  are  absolutely  continuous  with  density     f     and  specify  the  norm      IIFII 

to  be  a  Sobolev  norm  in  the  derivatives  of     f.      Let     z     be  r-dimensional,    let     A     denote 

A  r         ^-         A 

a     r  X  1     vector  of  nonnegative  integers,      |A|    =  5]._  A.,     z     =  n._  (z.)   "",      d   f(z)  = 

A  A  ^^    ^  ^        ^ 


i        f(z)/5     •••S  "",     Z     denote  a  compact  set,   and 


IFII   =  max,  ^., sup     ^\d  f  (z)  | , 
I A  I  :Sd        Z€Z 


where     d     is  a  nonnegative  integer  that  specifies  the  highest  order  derivative  of     f 
that  affects  the  norm. 

The  norm  depend  on  derivatives  of     f     up  to  order     d     to  allow  for     jjl{F)     to  depend 
on  derivatives  of     f     up  to  this  order.      An  example  where     d  >  0     would  be  needed  is  for 
weighted  average  derivatives.      We  specify  a  supremum  norm  to  make  it  relatively  easy  to 
show  the  Frechet  differentiability  hypothesis  of  Assumption  1  and  because  uniform 
convergence  rates  for  kernel  estimators  are  readily  available,   leading  to  rates  of 
convergence  for  the  remainder  terms  above. 

When  combined  with  Assumption  1  the  compactness  condition  on     Z     means  that     /li(F) 
can  only  depend  on  the  values  of     f(z)     for     z     in  a  compact  set.     This  restriction  will 
be  satisfied  if     n(F)     has  some  fixed  trimming  built  into  it,   or  if     f(-,(z)     is  zero 

outside  some  compact  set  and     K(u)     has  bounded  support,   so  that     f(z)     also  will  be  zero 

2 

outside  some  (slightly  larger)  compact  set.     For  example,   for     ;i(F)  =  JfCz)  dz, 

Assumption  1  will  be  satisfied  with     d  =  0     if     ff^(z)     has  compact  support,     K{u)     has 
bounded  support,     Z     is  chosen  to  be  a  large  enough  compact  set  containing  the  support  of 
f   (z)     in  its  interior.     We  have  chosen  to  impose  these  types  of  conditions  because  they 
can  apply  to  a  wide  variety  of  examples  but  still  lead  to  relatively  simple  results  that 
illustrate  the  bias  correction. 

The  next  regularity  condition  concerns  the  kernel. 
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Assumption  2:     JK(u)du  =  1,      K(-u)  =  K(u),     K(u)     has  bounded  support,   for     s  2:  2,     K(u) 
is  differentiable  of  order     d     witii  Lipschitz     d         derivative,   and     J"K(u)u   du  =  0     for 
all     A     with      |A|    <  s.    ■ 

The  symmetry  condition  is  not  needed  but  is  convenient.     The  bounded  support  condition 
for  the  kernel  is  imposed  here  to  keep  the  conditions  relatively  simple.     The  last 
condition  requires  that  the  kernel  be  a  higher  order  (bias  reducing)  kernel  of  at  least 

order     s.      Because  of  this  higher  order  kernel  assumption  the  order  of  the  bias  in  the 

th  s 

kernel  estimators  of  up  to  the     d         derivatives  of     fp,(z)     will  be  no   larger  than     h  , 

if     fp,(z)     has  at  least     s+d     derivatives.     The  next  condition  imposes  these  smoothness 

restrictions  on     f   . 

Assumption  3:      ff>(z)     is  continuously  differentiable  of  order     s+d     on     IR       with  bounded 
s+d         derivatives,   for  some     c  >  0     and  all     A     with      |A|    =  s,     Jsup  \d   f(z+A)|dz  < 


Under  Assumptions  2  and  3  the  number     s     can  be  thought  of  as  the  minimum  of  the  order  of 
the  kernel  and  a  degree  of  smoothness  for     f(-|(z).      If     K(u)     is  a  bias  reducing  kernel  of 
order     b     and     ff^(z)     has     d+a     continuous  derivatives  then  Assumptions  2  and  3  will  be 
satisfied  with     s  =  min{b,aK 

Assumptions  1-3  are  sufficient  to  obtain  the  large  sample  properties  of  the 
estimator     fi. 

Theorem  2.1:     If  Assumption  1-3  are  satisfied  and     h  =  h       such  that     nh         /ln(n) 
— >  00     and     h  — >  0,     and     6('z^     is  continuous  with  probability  one  and  bounded, 

R     =  0  (ln(n)/VRh'"^^^  +  \^h^^),     B     =  0  (/Rh^)  +  o  (1). 
n         p  n  p  p 

Also,  if     ln(n)/Vnh'"*'^^  -^  0     and     VRh^  -^  0     then     ^/nCtl-aJ  =  y.'^Mz.)/VR  +  o   (1). 
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The  hypotheses  of  this  result  require  undersmoothing.     The  optimal  bandwidth  (minimum 

mean-square  error)  for  estimation  of  the     d         derivative  of     fr,(z),     when     f^(z)     has 

u        J  J   J     •     *•  J  *u     ,  ,    ■         th  ,         .       ,  *  l/(r+2d+2s) 

d+s     continuous  bounded  derivatives  and  the  kernel  is     s         order,   is     h     =  n  , 

and     Vnlh  )     =  n  does  not  go  to  zero.      Here     h     must  be  chosen  smaller 

than     h       to  have     vnh     — >  0. 

To  obtain  conditions  for  ^^-consistency  of  the  bias  corrected  estimator,   it  is 

essential  to  be  specific  about  5(z).      Here  we  assume  that     6(z)  =  5(z,F),     where     6(z,F) 

satisfies  certain  smoothness  conditions   in     F. 


Assumption  4:     There  is  a  set  of  functions     5-     such  that     F     e  J     and  for  small  enough 
h,     F  e  5^,     Jl(u:£z)K   (u-z.)du  6  ^     with  probability  one.      Also,   there  is     b(z)     bounded, 
with     b(z)  =  0     for     z  S  Z,     and     D(z,F)     that  is  linear  in     F     such  that  for     F  e  5-, 
|5(z,F)-5(z,F    )-D(z,F-F   )|    £  b(z)IIF-F    11^,      and      |D(z,F)|    £  b(z)IIFll. 


This  condition  follows  from  second  order  Frechet  differentiability  of     5(z,F)     in     F, 
with  bounded  derivative.      It  is  helpful  in  deriving  the  order  of  both  the  stochastic 
equicontinuity  term     S       and  the  nonlinear  term     T       in     D   .     We  use  this  assumption  to 
obtain  the  order  of     S  ,     rather  than  empirical  process  methods,  because  a  direct  proof 
for  kernel  estimators  (as  in  Newey  and  McFadden,   1994)  seems  to  allow  for  a  wider  class 
of  functionals.     In  particular,  empirical  process  methods  for  showing  that     S     -^  0 
rely  heavily  on  smoothness  of     5(z)     in     z,     that  can  be  avoided  by  using  Assumption  4. 
Also,  when     6(z)     is  linear  in     F     (i.e.     b(z)  =  0     as  for  the  average  density), 
U-statistic  theory  gives     S     — ^  0     under  very  weak  conditions. 

These  conditions  lead  to  the  following  result  for  the  bias  corrected  estimator: 
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Theorem.  2.2:     If  Assumptions  1-4  are  satisfied,  and     h  -  h       such  that     nh         /ln(n) 
— >  00     and     h  — >  0,     then 

(2.9)  D     =  0   an(n)/VRh'"^^^  +  VRh^^  +  h^). 

n         p 

Also,   If     ln(n)/VKh^'^^^  ->  0     and     VRh^^  -^  0     then     VR(ii-yiJ  =  Y^ Mt..)/^  +  o   (I). 

0         '-'1=1       I  p 


The  upper  bound  on     D       obtained  here  will  be  smaller  than  the  upper  bound  obtained  for 

B        in  Theorem  2.1,   for  a  range  of  bandwidths     h.      Also,   if  the  bandwidth  is  chosen  to  be 
n 

the  mean-square  minimizing  value  for  estimation  of  the     d         derivative  of     fp.(z),      then 
the  estimator  will  be  V^-consistent  for  any  value  of     s     that  is  large  enough  so  that 

there  exists  an     h     satisfying  the  conditions  of  this  theorem.     Specifically,   existence 

r+2d  2s 

of     h     such  that     InCnj/VTih  — >  0     and     \/nh       — >  0     requires     s  >  d  +  r/2,      and  in 

this  case  the  bandwidth  which  is  pointwise  optimal  for  estimation  of  the     d         derivative 

of     fp,(z),      which  is     n  ,     will  satisfy  the  conditions  for  ■/n-consistency. 

This  result  then  shows  that  v/n-consistency  of     ji     does  not  require  undersmoothing  of     f. 

The  conditions  for  Vn-consistency  of     (j     are  weaker  than  the  conditions  for 

^n-consistency  of     fi,     in  the  sense  that  they  require  less  smoothness.     Existence  of     h 

satisfying  the  bandwidth  conditions  of  Theorem  2.1  requires     s  >  r  +  2d,     while  existence 

of     h     satisfying  the  conditions  of  Theorem  2.2  for  v^-consistency  requires  only 

(2.10)  s  >  (r+2d)/2. 


Thus,  with  the  bias-corrected  estimator     f(z)     is  only  required  to  have  half  the  number 
of  derivatives,  as  for  the  original  estimator,  or  alternatively  if     ^^.(2)     has  all  the 
derivatives  that  are  needed,   the  kernel  for     /i     need  only  be  half  the  order  of  the  kernel 
for     /J     in  order  to  attain  Vn'-consistency.     This  improvement  is  achieved  at  the  expense 
of  smoothness  of     6(z,F)     as  a  function  of     F,     as  in  Assumption  4. 
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3.        Bootstrap  Corrected  Nonparametric  Estimation 

Another  approach  that  also  reduces  the  size  of  the  bias  in  functional  estimation  and 
can  remove  the  need  to  undersmooth,   is  to  plug  in  a  nonparametric  estimator  that  has  been 
corrected  for  smoothing.      To  motivate  this  approach,   recall  that  the  first  order  bias  in 
fi(F)     is  the  order  of     E[F]-F   .     We  could  eliminate  this  bias  if  we  could  replace     F     by 
the  empirical  distribution,  vi^hich  is  unbiased.     Often  this  replacement  is  not  possible 
because  smoothing  is  required  to  bring     F     into  the  domain  of     (j(F)     (e.g.    when     /j(F) 
depends  on  the  density  of     F)     and  smoothing  induces  some  bias.      However,   we  can 
construct  a  smooth  estimate  of  the  smoothing  effect  and  use  it  to  partially  correct  for 
bias.     To  describe  this  correction,   suppose  that     F  =  CP     for  some  transformation     C 
with  domain  that  includes  the  empirical  distribution     P.     Often     C     will  be  a  smoothing 
transformation,   such  as  convolution  for  kernel  estimators,   that  depends  on  the  sample 
size  and  gets  close  to  the  identity     I     as  the  sample  size  grows.     Then  for  large  samples 
CF-F     should  be  an  estimate  of  the  smoothing  effect     CP-P.      Subtracting     CF-F     from     F 
gives 

(3.1)  F  =  F  -  (CF  -  F)   =  2F  -  CF  =  2CP  -  C^P  =  (2C-C^)P. 

2 

where     C       is  the  composition  of     C     with  itself.     This  is  a  bootstrap  correction  for 

smoothing,   in  the  sense     P     is  replaced  by     F     in  the  smoothing  effect     CP-P     to  form  the 
correction     CF-F. 

In  this  Section  we  will  consider  an  estimator  obtained  by  plugging  in  this  smoothing 
corrected  nonparametric  estimator,  giving     /j  =  ii(F).     It  will  be  shown  that  this 
estimator  may  be  v^-consistent  without  undersmoothing.     This  approach,   of  bootstrap 
correcting  the  distribution  estimator  and  plugging  it  in,  allows  the  same  nonparametric 
estimator  to  attain  the  optimal  nonparametric  convergence  rates  and  be  plugged  into 
functionals  that  attain  the  optimal     l/\^     rate.     To  achieve  this  simultaneous 
optimality,  the  influence  function  will  need  to  satisfy  certain  conditions  that  are 
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detailed  below. 

To  understand  the  effect  that  the  bootstrap  correction  has  on  the  functional 
estimator,   consider  an  expansion  analogous  to  equation  (2.4)  with 

(3.2)  Mix  -  n^)  =  Ij^^iAlzJ/v^  ^  ^n  ^  ^n'     ^n  "  ^R^F'-F'o.Fq),   B^  =  ^J'6(2)(F-P)(d2 


This  expansion  is  the  same  as  for     ^(F)     except  that     F     replaces     F,      with     E[B   ]     being 

2  '• 

the  first  order  bias  term.      Suppose  that     C     is  linear,   with     E[F]  =  E[{2C-C   )P]  = 

2  * 

{2C-C   )F   .      Also,   suppose  that  there  is  a  transformation     C  5     of     5(z)     such  that 

•  » 

J'5(z)CF(dz)  =  SC  5(z)F(dz)     for     F  =  F       and     F  =  CF   .      With  more  structure     C       may  be 

interpreted  as  the  adjoint  of     C,     as  in  examples  given  below.      Then  if  the  order  of 

integration  can  be  interchanged, 

(3.3)  E[B   ]  =  v^J'5(z)[(2C-C^-I)F^](dz)  =  -ySj5(z)[(I-C)^F^](dz) 

n  U  U 

=  -v^J[(I-C*)5](z)[(I-C)FQ](dz). 


Here     (I-C)F        represents  smoothing  bias  from  the  transformation     C,      small  in  large 

# 
samples  when     C     is  close  to     I.     The  term     (I-C  )6     is  an  analogous  term  that  should 

also  be  close  to  zero  in  large  samples.      Comparing  equation  (3.3)  with     E[B   ]  = 

-■/nJ'5(z)[(I-C)F   ](dz)     for     B       from  equation  (2.4),   we  see  that  using  the 

bootstrap-corrected  estimator     F     leads  to  the  replacement  of     5     by     (I-C  )5     in  the 

* 
integral,  reducing  the  first  order  bias  when     (I-C  )5     is  close  to  zero.     This  is  the 

bias  complementarity  effect  referred  to  above,  where  the  bias  in     F     has  been 

* 
complemented  by  the  influence  function  remainder     (I-C  )5. 

^  Kernel  estimators  are  an  important  example,  where     F     has  density     f(z)  = 

/s  — p  ^ 

jK(z,u)P(du)     for     (c{z,u)  =  h     K((z-u)/h).     Plugging  in     F     for     P     in  the  formula 

for     f     leads  to  a  transformation     C     where     CF     has  density     J"K(z,u)F{du).     To  describe 

the  bootstrap  corrected  estimator     F  =  2F-CF,     let     K(u)  =  2K(u)  -  J'K(u-t)K(t)dt     be  the 

"twicing"  kernel  associated  with     K(u).     Then  the  density  of     F     will  be 
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(3.4)  f(z)  =  2f(2)  -  jKtz.ujfCujdu  =  2f(z)  -  JjK(z,u)K(u,t)duP(dt)  =  jK(z,u)P(du), 
lc(z,u)  =  2k(z,u)  -  J'K(z,t))c(t,u)dt  =  h~^K((z-u)/h). 

That  is,   the  smoothing  correction  gives  a  kernel  estimator  with  a  twicing  kernel  that  is 
constructed  from  the  original   kernel. 

We  use  twicing  kernel  estimators  to  illustrate  the  bias  correction,   although  they 
may  not  be  the  best  choice  of  nonparametric  estimator.     A  twicing  kernel  has  order  that 
is  twice  that  of  the  original  kernel  and  is  not  everywhere  positive.     Consequently,   the 
density  estimator     f(z)     may  not  be  everywhere  positive,   which  may  be  an  undesirable 
feature.      Also,   it  is  known  that  using  higher  order  kernels  may  not  improve  mean-square 
error  in  most  sample  sizes  of  interest,   e.g.   see  Marron  and  Wand  (1992).     This  motivates 
a  search  for  other  bootstrap  corrected  estimators,   as  considered  below. 

To  understand  the  bias  formula  in  equation  (3.3)  for  kernel  estimators  we  need  to 

*  * 

find  a  transformation     C       with     JC  5(z)F(dz)  =  j5(z)CF(dz).     For  any     F     with  density 

f,     CF     has  density     jK(z,u)f(u)du,     so  that  for     5(z)  =  J'/c(u,z)5(z)du,  . 

j6(z)CF(dz)  =  j5(z)[J'K(z,u)f(u)du]dz  =  J5(z)f(2)d2  =  SC  5(z)F(dz). 

»  _  * 

where     C  5(z)  =  5(z).     Here     C       is  the  integral  transform  obtained  by  interchanging     z 

and     u     in     k,     which  is  known  to  be  the  adjoint  under  certain  conditions  (Luenberger, 

1969,  p.   153).     This  transformation  is  a  convolution,  with     5(z)  =  J'K(u)6(z+hu)du. 

Equation  (3.3)  now  becomes 

(3.5)  E[B   ]  =  -/Sj[5(z)-5(z)][f(z)-f-(z)]dz. 

n  0 

The  bias  effect  of  the  bootstrap  correction  is  to  replace  the  integrated  convolution  bias 
J'6(z)[f(z)-f   (z)]dz     with  the  integral  of  the  product  of  convolution  biases  in  equation 
(3.5).     The  magnitude  of  the  bias  reduction  will  depend  on  the  convolution  bias 
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5(z)-5(z),     which  in  turn  depends  on  the  smoothness  of  the  influence  function,   as  is  well 
known  from  the  kernel  estimation  literature.     The  following  result  makes  these  conditions 
precise. 

Theorem  3.1:     If  Assumptions  1-3  are  satisfied,     S(z)     is  continuously  differentiable  of 

order     t  ^  s     on     R       with  bounded  derivatives,     h  =  h       such  that     nh         /ln(n)  — >  oo 

n 

/K  ~  ~  S  +  t  t 

and     h  — >  0,     then  equation  (3.2)  is  satisfied  with     /a  =  ^i(F)     and     B     =  0   (Vnh       +h  ). 

Also,  if     ln(n)/Vnh'"^^^  -^  0     and     Vnh^"^^  ->  0     then     Vii(il-n)  =  y.'^Mz.)/VR  +  o   (1). 

0         ^1=1        I  p 

The  upper  bound  on     B       is  smaller  than  the  bound  on     B       in  Theorem  2.1  because     Vnli 
^^  n  n 

s+t 

has  been  replaced  by     Vnh       .     This  replacement  means  that  sufficient  conditions  for 

\/n-consistency  of     /j(F)     are  different  than  for  the  original  plug-in  estimator     jj.      If 

r+?d  s+t 

the  bandwidth  is  chosen  so  the  dominating  remainder  terms     ln(n)/\/nh  and     Vnh 

are  asymptotically  proportional  then     fi(F)     will  be  \^-consistent  if 

(3.6)  s  +  t  >  r  +  2d. 

This  condition  allows  some  tradeoff  of  smoothness  of     f^(z)     and     5(z)     for  attaining 
v^-consistency. 

This  estimator  also  attains  V^-consistency  without  undersmoothing  if  the  influence 

function  is  smooth  enough.     Consider  the  case  where     fp,(z)     is  only     s     times 

~  s 

differentiable,   so  that  the  order  of  the  pointwise  bias  in     f     is     h  .      Choose  the 

2s  -1  -r-2d 

bandwidth  so  that  the  squared  bias  order     h         is  proportional  to  the  order     n    h 

of  the  pointwise  variance.     Then  the  conditions  for  v/n-consistency  are  satisfied  if 

(3.7)  t  >  r/2  +  d. 

The  smoothing  bias  correction  depends  on  smoothness  of  the  influence  function  in     z, 
in  contrast  to  the  bias  correction  of  Section  2,  that  depends  on  smoothness  of  the 
influence  function  in     F.     The  bias  correction  of  Section  2  may  still  give  \^-consistency 
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even  when     5(z)     is  not  smooth  in     z.     For  example,   the  functional     fi(F)  = 

2 
Jl(a:£z:sb)f(z)  dz     has     6(z)  =  l(a:5z<b)2f   (z)     that  is  discontinuous  when  the  density  is 

positive  at     a     or     b,     and  so  does  not  satisfy  the  conditions  of  Theorem  3.1  for  any     t. 

Nevertheless,     5(z)  =  l(a:£z:£b)2f(z)     is  an  influence  function  estimator  that  would 

satisfy  the  conditions  of  Theorem  2.2,   so  that  the  bias  corrected  estimator  of  equation 

(2.5)  could  attain  v^-consistency  without  undersmoothing. 

It  is  interesting  to  compare  the  properties  of  a  plug-in  estimator  based  on  a 
twicing  kernel  with  one  based  on  another  kernel  of  the  same  order  as  the  twicing  kernel. 
For  simplicity  suppose  that     d  =  0.      Consider  a  plug-in  estimator  with  an  ordinary  kernel 
where     s  >  r     and  the  bandwidth  is  chosen  to  give  /n-consistency.     The  plug-in  estimator 
with  a  twicing  kernel  of  the  same  order  will  also  be  Vn-consistent  for  the  same  bandwidth 
choice,   if     t  =  s  >  r/2.     That  is,  \/n-consistency  with  a  twicing  kernel  only  requires 
f   (z)     to  be  half  as  smooth,   if     5(z)     also  is  smooth  enough.     Also,   the  same  bandwidth 
no  longer  has  to  involve  undersmoothing,   because  only  half  as  many  derivatives  of  the 
density  are  needed  to  exist,   and  the  optimal  bandwidth  for  estimating  a  density  with 
fewer  derivatives  will  be  smaller. 

There  are  other  nonparametric  estimators  that  have  the  same  functional  bias 
reduction  property  as  twicing  kernel  estimators.     A  particularly  important  class  are 
those  where  the  smoothing  transformation  is  idempotent,   with     CF  =  F.     Here     F  =  2F  -  CF 
=  2F-F  =  F,     so  the  bootstrap  smoothing  correction  is  "built  into"     F.     Our  results  then 
indicate  that  undersmoothing  may  not  be  needed  in  this  case. 

One  idempotent  nonparametric  estimator  is  an  orthogonal  series  density  estimator. 

Let  (p.(u),   j  =  1,   2,    ...)     be  a  sequence  of  functions  that  are  orthonormal  with  respect 

to  Lebesgue  measure  on     (R  ,     i.e.     Jp.Culp,  (u)du  =  1     if     j  =  k     and  equal  to  zero 

J        k 

otherwise.     Let     p  (u)  =  (p  (u),...,  p,(u))'      and     a  =  X-_iP  (z-)/n.     Then  an  orthogonal 
series  estimator  is 
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f(z)  =  p-^(z)'a  =  jK(z,u)P(du),     k(z,u)   =  p-'(z)' p-^lu). 


Like  the  kernel  estimator     f     has  the  linear  form     jK(z,u)P(du),     but     k(z,u)     is  now  an 
inner  product  of  orthonormal  functions  rather  than  a  kernel.     To  see  the  effect  of  the 
bootstrap  correction,   plug  in     F     for     P  in  the  density  formula  to  obtain 

jK(z,u)f(u)du  =  J'[jK(z,u)»c(u.t)du]P(dt) 

=  V(z)'[J"p"^(u)p"^(u)'du]p-'(t)P(dt)  =  jK(z,t)P(dt)  =  f(z), 

so  that  the  transformation     C     is  idempotent.      Note  that     f(z)  =  E[f(z)]  = 

p  (zj'Jp  (u)f   (u)du     is  the  minimum  integrated  squared  error  (ISE)  approximation  to 

f   (z)     and     5(z)  =  j6(u)K(u,z)du  =  p  (z)'J"5(u)p  (u)du     is  the  minimum  ISE  approximation 

to     6(z).      Then  by     jSlzJif^Cz)-? (z)]dz  =  0, 

E[B   ]  =  V^SSU)[fiz]-fAz)]dz  =  -/Kj[5(z)-5(z)][f(z)-f^(z)]dz. 
n  U  U 

Here  the  bias  term  that  appears  to  be  first  order  is  actually  second  order,   a  result  of 
the  bias  correction  being  built  into     f.      Consequently,   the  bias  of  the  functional 
estimator  can  shrink  faster  than  the  pointwise  bias,   removing  the  need  to  undersmooth. 

This  example  can  be  made  precise  by  specifying  an  ISE  rate  of  approximation 
for     ff^(z)     and     5(z),     leading  to  the  following  result: 

—  2       1/2 

Theorem  3.2:  If  Assumption  1  is  satisfied,     f rf^)     ^^  hounded,     {S[S(z)-5(z)]  dz)         = 

0(J~^^'~),     and     {Slf(z)-f ^(z)fdz)^^^  =  0(J~^'^^)     then  equation  (2.4)  is  satisfied  and 


B     =  Y..'^Jd(z.)-d(z.)}/Vn  =  0  (J  *''''  +  VnJ  ^^''  *'^0. 


Furthermore,  if    y/nJ  -^  0     and     VnWF-F^W     -^  0,     then     VnCju-ju^;  =  I-^/Cz^VvTi  + 

o   (V. 
P 


-  21  - 


The  hypotheses  of  this  result  are  not  very  primitive,   but  are  consistent  with  the  known 
rate     s/r     for  approximation  by  orthogonal  polynomials,   where     s     is  the  number  of 
continuous  derivatives  that  exist.     The  conditions  are  specific  enough  to  see  that 

undersmoothing  will  not  be  required  for  /n-consistency.     Specifically,     VnllFrF    II     — ^  0 

-1/4 
will  hold  if     f     converges  slightly  faster  than     n         ,     which  will  not  require 

-s/r 
undersmoothing.      Also,   the  mean-square  bias  of     f     is     0{J         )     by  hypothesis,   but  the 

conditions  only  require  that     Vni  — >  0.     Therefore  if     t     is  large  enough  the 

bias  could  be  allowed  to  shrink  at  the  same  rate  as  the  standard  deviation  without 

1/4  -s/r 
affecting  consistency.     For   instance,   if     t  2:  s     then     n       J  — >  0,     meaning  the  bias 

—1/4  /— >  — s/r— t/r 

shrinks  faster  than     n         ,     will  suffice  for  vnJ  — >  0,     and  should  be  implied  by 

-1/4 
f     converging  faster  than     n         .      We  can  obtain  more  primitive  conditions  in  the  case  of 

2 
specific  functionals.     For  example,   the  average  density  estimator     fi(f)  =  J"f(z)   dz     is 

—  ?         I /?  -t;/r  -?c:/r 

/K-consistent  if     {/[flzl-f^lz)]   dz>         =  0{J         )     and     v^(J/n  +  J  )  ^  0. 

So  far  we  have  only  considered  the  case  where  the  smoothing  transformation     C     is 
linear.      When     C     is  nonlinear  analogous  results  should  also  hold:     A  bootstrap  smoothing 
correction  may  remove  the  need  for  undersmoothing  when  the  influence  function  has  enough 
derivatives,   and  this  correction  will  be  built  into  estimators  that  are  idempotent 
transformations  of  the  empirical  distribution.     These  results  could  be  shown  by  including 
an  expansion  of     C     in  the  analysis.     However,  this  would  greatly  complicate  the 
analysis,   so  we  content  ourselves  with  pointing  out  some  existing  examples  of  nonlinear, 
idempotent  transformations  where  it  is  known  that  undersmoothing  is  not  needed. 

Newey  (1994)  showed  that  undersmoothing  is  not  needed  for  functionals  of  a  series 
estimator  of  a  conditional  expectation.     This  occurs  because  series  estimators  of 
conditional  expectations  are  idempotent,   leading  to  a  corresponding  idempotent 
distribution  estimator.     For  brevity,  we  omit  details.     Shen  (1997)  has  also  shown  that 
undersmoothing  may  not  be  needed  for  sieve  density  estimators  when  the  influence  function 
is  smooth  enough.     This  also  corresponds  to  an  idempotent  transformation.     We  can 
describe  a  sieve  estimator  as  the  solution  to 
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f  =  argmax     ^Y.---,^^^^^-^^'"-  =  argmax       Jlnf(z)P(dz), 

where     B     is  some  restricted  class  of  densities.      Sieve  estimators  are  of  this  form, 
where     B     is  some  parametric  family  that  also  imposes  other  restrictions,   such  as 
boundedness  of  higher  order  derivatives.      It  follows  from  the  information  inequality  that 

f  =  argmax^^gj[lnf(z)]f(z)dz, 

i.e.    if  we  replace     P     by     F     in  the  transformation  that  gives     F     we  obtain     F     again. 
Thus,     CF     is  idempotent,   so  the  smoothing  correction  is  built  into  a  sieve  estimator. 
Higher  order  bootstrap  smoothing  corrections  could  also  be  carried  out.      Consider 

F^  =  [I-(I-C)^]P, 


where     L     is  a  positive  integer.      We  have     F    =  F     and     F     =  F,      while     L  >  2     correspond 
to  higher  order  bootstrap  corrections.     The  estimator     ;-i(F    )     will  have  an  expansion  like 
equation  (3.2),   with     F       replacing     F.     Assuming     C     is  linear  as  before,   and  letting     j 
be  any  integer,      0  £  j  :£  L,     the  first-order  bias  term  will  be 


This  represents  a  higher-order  bias  complementarity,   where  some  of  the  smoothing  bias  is 
shifted  from     F       to     5.     Of  course,   if     C     is  idempotent  these  higher-order  bootstrap 
corrections  are  built  into  the  estimator,   i.e.     F     =  F. 

We  also  note  that  it  is  possible  to  bootstrap  correct  the  functional,   forming  an 
estimator  as 


fi  =  li(F)  -  [fi(CF)-fi(F)]  =  2m(F)   -  (j(CF). 


If  the  functional  were  linear  then     /j  =  ii{F),     i.e.   the  estimator  is  the  same 
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whether  we  bootstrap  correct  the  functional  or  the  density.      In  the  general  nonlinear 
case  the  first-order  linear  terms  in  the  expansion  of  equation  (2.3)  would  be  the  same 
for  both  estimators,   so  that     (jl     and     )u     should  have  the  same  first-order  bias. 


4.        Linear  Kernel  Averages 

Many  important  estimators  are  averages  of  functions  of  nonparametric  estimators  and 
data  observations.     Examples  include  the  well  known  average  density  estimator 
X|._,f(x.)/n     and  the  weighted  average  derivative  estimator  of  Powell,   Stock,   and  Stoker 
(1989).     The  bias  corrected  estimation  results  can  be  extended  to  cover  this  case,   and  we 
do  so  in  this  Section  and  the  following  one. 

Consider  a  parameter  of  interest 

^Xq  =  E[g(2,FQ)]  =  Jg(z,FQ)FQ(dz), 

where     g(z,F)     is  some  known  function,   that  may  have  a  restricted  domain  as  a  function  of 
F.      One  way  to  estimate     fi       is  to  plug  a  nonparametric  estimator     F     into     g(z,F)     and 
integrate  over  the  empirical  distribution  to  obtain 

(4.1)  11  =  J'g(z,F)P(dz)  =  Xj^^g(Zj,F)/n. 

One  could  also  use     F     in  place  of     P,     although  the  estimator     fi     is  often 
computationally  simpler. 
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To  understand  how  a  bias  correction  should  be  constructed  for  this  estimator,    let 
m(F)  =  E[g(z,F)]  =  J'g(z,F)FQ(dz).     Then 

(4.2)  A  =  J-g(z.FQ)P(dz)  +  {liFhiiiF^)  +  S^,      S^  =  J[g{z,F)-g(z,FQ)](P-FQ)(dz). 


Here     S        is  a  stochastic  equicontinuity  term  that  is  second  order,   so  to  first  order     u 
n 

is  Jg(z,F    )P(dz)  +  m(F)-m(Fq).      The  term     J'g(z,FQ)P(dz)     is  unbiased  for     (n         but  the 
expectation  of     fi(F)-/i(F    )     may  depart  from  zero  due  to  smoothing  inherent  in     F     and  to 
nonlinearity  in     F.     Therefore,   to  bias  correct     y.     we  need  to  bias  correct  for  the 
functional     /i(F).     This  correction  can  be  constructed  by  applying  the  analysis  of  Section 
2.      Suppose  that     (i{F)  =  i"g(z,F)F   (dz)     satisfies  equation  (2.3)  and  has  influence 
function     5(z)  =  5(z,F   )     and  let     5(z)     denote  an  estimator  of     5(z).      The  bias 
correction  is  then     J'3(z)(P-F)(dz)     and  the  corresponding  estimator  is 

(4.3)  il  =  M  +  l5(z)(P-F)(dz)  =  ii  +  X;.",5(z.)/n  -  i"5(z)F(dz). 

This  correction  is  the  same  as  in  Section  2  except  that     6(z)     estimates  the  influence 
function  of     J'g(z,F)F   (dz). 

It  is  also  possible  to  form  a  bias  corrected  estimator  by  applying  the  analysis  of 
Section  3.      Plugging  in  a  bootstrap  corrected  nonparametric  estimator     F     in     g(z,F) 
gives 

(4.4)  ji  =  Jg(z.F)P(dz)  =  X."^g(z.,F)/n. 

It  can  be  shown  by  results  analogous  to  those  of  Section  2  and  Section  3  that  the  need 
for  undersmoothing  can  be  removed  by  both  approaches  to  bias  correction.     For  brevity  we 
omit  this  general  analysis,   that  is  a  special  case  of  Section  5.     We  focus  here  on  the 
linear  case,  where     g(z,F)     is  linear  in     F     and     F     is  linear,  where  we  can  derive 
asymptotic  mean-square  error  results.     This  allows  us  to  quantify  the  variance  effect  of 
the  bias  reduction,  and  includes  several  interesting  examples. 
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We  consider     g(z,F)  =  v3   {E   [y|x]f(x)}     where     v     and     y     are  not  elements  of     x. 

One  example  is  the  average  density,   where     v  =  y  =  I     and     A  =  0.      Another  is  a  density 

weighted  average  derivative,   where     A     is  a  unit  vector  and     y  =  I,      so  that  by 

integration  by  parts     n^  =  Elvd^T^U)]  =  E[E[v|x]5\q(x)]  =  -E[fQ(x)a'^E[v|x]]/2.      A 

third  example  is  a  density  weighted  conditional  covariance,   where     fi     =  E[vE[y|x]f   (x)]. 

The  estimators  are  obtained  by  substituting  a  kernel  estimator  for     d  {E  [y|x]f(x)> 

r 

and  averaging  over     z..     They  have  the  form 

(4.5)  A  =  i:.'^,v.[a'^j:  ",K,  (x.-x.)y./n]/n  =  ^."T.'^flV  (x.-x.)v.y  ./n^. 

^1=1  1       ^j=l    h     1     J     J  ^i=l^j=l        hi     J     1  J 


These  types  of  estimators  have  been  considered  by  Hall  and  Marron  (1987),   Hardle  and 
Tsybakov  (1993),   Powell  and  Stoker  (1997),   and  others.     Our  contribution  is  to  show  how 
the  bias  complementarity  affects  the  mean-square  error  of  these  estimators. 

The  estimator     fi     has  precisely  the  form  in  equation  (4.1)  for  a  certain     F.     To 
describe  this  form  suppose  that     z  =  (x,w)     where     w     includes     y     and     v,      and  let 

(4.6)  F(z)  =  n~V.",l(w.<w)Jl(x<x)K,  (x-x.)dx. 

^1=1       1  h         1 

The  corresponding  marginal  density  of     x     is  the  kernel  estimator     f(x)  =  5]._  K   (x-x.)/n. 

Also,   for  any  function     a(w)     the  estimator  of     E[a(w)|x]     is     n    Y.    ,a(w.)K,  (x-x.)/f(x). 

^1=1        1     h         1 

Therefore,   the  estimator  of     d   (E   [y|x]f(x)}     is     d  V ._  K  (x-x.)y  ./n,     so  that 

1"  J-l    h         J     J 

J'g(z,F)P(dz)     is  equal  to     fi     in  equation  (4.5). 

It  turns  out  that  with  a  symmetric  kernel  and  a  certain     5(z),     the  bias  corrected 

estimator  of  equation  (4.3)   is  identical  to  the  bootstrap  corrected  estimator  of  equation 

(4.4)  based  on  the  corresponding  twicing  kernel.     To  describe  this  result,  apply  Newey 

(1994)  to     |^(F)  =  E[vS  {E„[y|x]f(x)}]     to  obtain  the  influence  function  formula 

r 

(4.7)  5(z)  =  5(z,Fq),     5(z,F)  =  £(x,F)y,     «x,F)  =  (-1) '  ^' sNEj,[v|x]f(x)}. 

An  estimator  of  this     6(z)     obtained  by  plugging  in  the     F     from  equation  (4.5)  is 
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5(z)  =  5(z,F)  =  Ux,F]y  =  (-1) '  ^ '5^{^.",K,  (x-x.)v./n>y  =  a'^i^.^.K,  (x.-x)v./n}y, 

•^  ^1=1    h         11  ^1=1    hi         1 


where  the  last  equality  follows  by  symmetry  of  the  kernel.  Using  this  equality  the  bias 
corrected  estimator  of  equation  (4.3)  will  be 

(4.8)  M  =  M  +  Ij"^3Nj:."^K^(x.-x.)v./n}y./n  -  J£(x,F)yF(dz) 

=  2m  -  n~V.",yJ£(x,F)K,  (x-x.)dx  -  2/^  -  n~V.",y.j5'*{y;.",K.  (x.-x)v./n>K^(x-x.)dx 
'-'1=1  1  hi  ^1=1  1         ^j=l   h     J         J         h         1 

=  2m  -  E."T.",5^J-K,  (x.-u)K,  (u-x.)du]y.v./n^  =  ^."^T  ",3'^K,  (x.-x.)v.y ./n^. 
^       ^i=l^j=l  h     J         h         1  1  J  '^i=l'-'j=l        hi     J     ij 

where     K(u)  -  2K(u)  -  J'K(u-t)K(t)dt     is  the  twicing  kernel  corresponding  to     K(u).      This 
estimator  has  the  same  form  as  in  equation  (4.5),   except     K     is  replaced  by     K.      As  noted 
before,   equation  (4.5)  is  the  same  as  equation  (4.1)  for     F     in  equation  (4.6),   so 
equation  (4.8)  will  be  the  same  as  equation  (4.4)  when     F     is  as  given  in  equation  (4.6) 
with  the  twicing  kernel     K     used  in  place  of     K.     Thus,   equation  (4.8)   shows  that  the 
bias  corrected  estimator  of  equation  (4.3)   is  the  same  as  the  bootstrap  corrected 
estimator  of  equation  (4.4)  obtained  by  using  a  twicing  kernel. 

We  could  proceed  to  derive  asymptotic  mean-square  error  expressions  for  the 
estimator  in  equation  (4.8),   but  it  turns  out  that  a  mean-square  error  improvement  can  be 
obtained  by  deleting  the  "own  observation"  terms  in     fx     and  normalizing  by  the  total 
number  of  terms  in  the  sum  (see  Jones  and  Sheather,   1991).     This  modification  leads  to 
the  estimator 

(4.9)  M     =  I.^.S'^K,  (x.-x.)v.y./n(n-l). 
"^c       H*j        hi     J    rj 

We  will  focus  on  results  for  this  estimator  because  its  "cross-validated"  form  is  common 
in  the  literature  and  it  has  smaller  asymptotic  mean  square  error  than  /i-  Inclusion  of 
the  own  observation  would  affect  the  results  by  adding  a  term 
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a'^K^(0)y:.",y.v./n(n-l)  =  [a^WOMnh'^^  '  ^  '  )](j:.",y.v./(n-l)). 
h      ^1=1  1  1  ^i=ri  1 

The  order  of  this  term  will  be  l/(nh  ),     which  is  larger  than  the  terms  that  appear 

in  the  asymptotic  mean-square  error  derived  below  for       ^l  . 

Some  additional  assumptions  are  useful  in  the  mean-square  error  calculations.     Let 

fx     (x)  =  E[y^|x]f^(x),     M       (x)  =  E[v^|x]f-(x),     and     ^i      (x)  =  E[vy  |xlf^(x). 
^yy  ■'  0  WW  0  wy  0 

Assumption  5:     ii       (x)     and     u      (x)     are  continuous  and  for  some     c  >  0, 
^  WW  wy 

J"sup„.,,      /i     (x+A)fi       {x)dx     and     Jsup,, ...      \fi      (x+A)/j      (x)|dx     are  finite. 
IIAII^c  yy  WW  IIAII:£c     wy  ^wy 


This  hypothesis  is  useful  for  controlling  the  variance  of  the  estimator.     The  next  two 
conditions  help  to  determine  the  bias. 

Assumption  6:      a(x)   =  5   {E[y|x]f   (x)}     and     b(x)  =  E[v|x]f   (x)     exist  and  are 
continuously  differentiable  of  order     s     and     t  :£  s     respectively,   and  for  all 
multi-indices     A     and     A     with      |  A  |    =  s     and      |  A  |    =  t,     there  is  a     c  >  0     such  that 

'''^"P|IAII<c'^   ^'""^^^'^"PlIAll-c'^   b(x+A)|dx  <  CO. 


When  combined  with  Assumption  2  this  means     s     is  the  minimum  of  the  number  of 
derivatives  of     a(x)     that  exist  and  the  order  of  the  kernel. 

The  next  assumption  imposes  some  further  useful  smoothness  and  moment  conditions. 

Assumption  7:     Ux)  =  (-1)        d  b(x)     exists  and  is  continuous  and     a(x)     and     £(x)     are 

2  2 

bounded,     E[y  ]  <  oo,     and     E[v  ]  <  oo. 

Under  these  conditions  we  can  obtain  the  asymptotic  mean-square  error  of  the  estimator 
/J  .     Let     i//(z)  =  a(x)v  +  £(x)y  -  E[a(x)v+£(x)y]     be  the  influence  function  of 
J'g(z,F)F{dz),     C^  =  jK(u)u'^du. 

(4.10)  Q  =  {J[a'^K(u)]^du}J'{fx     (x)m       (x)+(-l)''^'fx      (x)^}dx, 

yy      ^ww  '^wy 
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B  =  IiM        ,=r,    ,C^C^J'a'^a(>:)3'^b(x)dx/(s!t!). 
^1 A|=s, I  A  I =t   A   A 


Theorem  4.1:     If  Assumptions  2  and  5-7  are  satisfied  then  as     n  — >  oo     and     h  — ^  0, 

....s  .,^r^^~    ^         -hr     r,,    ^7         -2^-r-2\X\  ^       ^2s+2t   2         ,2s+2t     -2-r-2\X\      -U 

(4.11)  MSE(n  )  =  n    Var[(jj(z)]  +  n     h  Q  +  h  B     +  o(h  +n     h  +n     ;. 

The  first  term  in  this  expression  is  the  usual  asymptotic  variance  under  v^-consistency 
that  will  dominate  if  the  other  terms  go  to  zero.     The  other  two  terms  are  variance  and 
bias  terms  from  kernel  estimation.     The  estimator  will  be  Vn'-consistent,   with  the 
usual  asymptotic  variance  term  dominating,   if     n    h  — >  0     and     nh  — >  0, 

meaning  the  pointwise  variance  of  the  kernel  estimator  goes  to  zero  and  the  product  of 
pointwise  biases  for     a{x)     and     b(x)     shrink  faster  than     i/Vn.     If  the  bandwidth  is 
chosen  to  balance  the  variance  and  bias  terms,     so  that     n    h  and     h  are 

asymptotically  proportional,     (i       will  be  v^-consistent  if 

(4.12)  s  +  t  >  r/2  +    |A|. 

This  condition  is  weaker  than  the  requirement  of  Sections  2  and  3  that  is  made  possible 
by  the  linearity  of     g(z,F)     in     F     and  the  absence  of  the  own  observation  term. 

Equation  (4.12)  allows  equality  rather  than  the  strict  inequality  of  equations 
(2.10)  and  (3.6),   that  is  possible  because  uniform  convergence  rates  for     f     are  not  used 
here.     However     s  +  t  =  r/2  +    I A  |      is  a  knife  edge  case  where  the  variance  remainder  term 
in  equation  (4.11)  will  be     Qn     ,     the  same  size  as  the  leading  term.     In  this  case 
Vniyt  -|j   )     will  not  be  asymptotically  normal  with  variance     Var(i//(z)),     so  the  usual 
functional  asymptotic  theory  does  not  apply. 

We  can  compare  the  MSE  in  Theorem  4.1  to  the  MSE  for  the  estimator  based  on 
the  original  kernel  to  evaluate  the  effect  of  the  bias  correction.     Let     B  = 
El-it-  C..Sb{x)d  a(x)dx/(s!).     Then  by  Powell  and  Stoker  (1997),  the  estimator     /i 

I  A  1  — S    A  C 
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obtained  from  equation  (4.9)  by  replacing     K(u)     by     K(u)     has 

(4.13)  MSE(A  )  =  n"Var[0(z)]  +  n^V'^"^' ^  '  {J[a^K(u)]^du/J[a^K(u)]^du}Q 

^2s-^2  ,,2s  -2^-r-2|A|  -1, 

+  h     B     +  o(h       +  n     h  +  n     ). 


The  comparison  of  the  MSE  of     fi       and     n       is  partly  analogous  to  a  comparison  between 

the  pointwise  MSE  of  the  corresponding  kernel  estimators.     The  ratio  of  kernel  variance 

~  A  2  A  ~         2 

terms  for     fi       and     /i       is     Sid   K(u)]   du/J"[5   K(u)]   du,     which  is  exactly  the  same  as  the 

A  A~        2 

ratio  of  pointwise  variance  terms  for  estimation     S  f^(x)     at  a  point,   with     Sid   K(u)]  du 

A  2 

known  to  be  larger  than     Sid   K(u)]  du.      On  the  other  hand,   because  of  the  bias 

complementarity,   the  bias  term  is  not  reduced  in  exactly  the  same  way  as  for  pointwise 

estimation.     The  constant  in  the  bias  term  depends  on  the     t         derivatives  of     b(x) 

rather  than  the     2s         order  derivatives  of     a(x).     This  bias  reduction  is  better   in  some 

ways  than  the  pointwise  bias  reduction  from  a  higher  order  kernel.      In  particular,    it  may 

require  no  additional  derivatives  to  exist  when  smoothness  of     a{x)     implies  smoothness 

of     b(x)     (e.g.   as  for  the  average  density). 

The  asymptotic  mean-square  error  (MSE)  given  in  Theorem  4.1  can  be  used  to  obtain  an 

optimal  bandwidth  formula  for  estimation  of     fi   ,     when     t  =  s     and  the  kernel  is  order 

s,     with     B  *  0.     Minimizing  the  sum  of  the  second  and  third  terms  in  equation  (4.13) 

over     h     gives 

(4.14)  h  =  [Q(r+2|A|)/(B24sn2)]l/(4s+r-.2|A|)_ 


This  bandwidth  is  optimal,   in  the  sense  of  minimizing  the  leading  terms  in  the  MSE  of  the 
estimator. 

It  is  interesting  to  note  that  although  undersmoothing  is  not  required  for 
\/n-consistency,  undersmoothing  is  still  optimal  for     ji.     The  optimal  bandwidth  converges 
faster  to  zero  than  a  bandwidth  that  minimizes  the  asymptotic  mean-square  error  of     a(x). 
This  feature  seems  to  be  specific  to  linear  functional  estimators  with  the  own 
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observation  deleted.      If  the  own  observation  were  included  then  there  would  be  an 

n     h  term  in  the  MSE,   which  dominates  the     n     h  term,   and  will  make  the 

rate  for  the  optimal  bandwidth  for  functional  estimation  the  same  as  for  nonparametric 

estimation.     Also,   when  the  functional  is  nonlinear  there  is  an  additional  remainder  term 

~  2 

R(F-F   ,F   )     as  discussed  in  Section  3,   which  is  of  order     IIF-F   II       under  Assumption  1. 

This  should  make  the  MSE  of  no  smaller  order  than      IIF-F_II      =  0   (max{n    h  ,h     >), 

0  p 

where  undersmoothing  would  not  be  optimal. 

The  optimal  bandwidth  formula  in  equation  (4.14)   is  potentially  useful  for  choosing 
the  bandwidth  in  practice.     The  constants     Q     and     B     are  unknown,   but  could  be  estimated 
by  replacing  the  unknown  nonparametric  terms  in  their  formulae  by  kernel  estimates  to 
obtain     Q     and     B,     which  could  then  be  used  in  place  of     Q     and     B     in  the  optimal 
bandwidth  formula  to  form  an  estimate. 

An  interesting  example  is  the  density  weighted  average  derivative  estimator 
described  above,   where      |A|    =1     and     y  =  1.     Here  the  influence  function  for     fi       will 
be     i//(z)  =  8  f   (x){v-E[v|x]}  -  f   (x)3  E[v|x]  -  fi         as  derived  in  Powell,   Stock,   and 
Stoker  (1989).     Suppose  that     fp,(x)     is     s+1     times  differentiable  and     E[v|x]     is     t 
times  differentiable,  that  the  original  kernel  has  order  at  least     s,     and  all  the 
following  integrals  exist,   and  let 

Q  =  a[a^K(u)]^du}JVar(v|x)fQ(x)^dx 

B  =  1,^,^3^  |^,^^CxCx/s't3\(x)]aV[v|x]fo(x)}dx/s!t!. 


Then,  the  conclusion  of  Theorem  4.1  implies  that  as  n  — >  m     and     h  — >  0, 

(4.15)  MSE(mJ  =  Var[iA(z)]/n  +  n"\"''"^Q  +  h^^^^^B^  +  o(n~-^+n~V^~^+h^^*^^). 


In  this  example  the  optimal  bandwidth  with     s  =  t     is 
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^r^r     oi/ro2.     2,,l/(4s+r+2) 
h  =  [Q{r+2)/(B  4sn   )] 


For  example,   with  a  normal  kernel,     s  =  t  =  2,     r  <  6,     and  the  bandwidth  chosen 
optimally,   the  bias  corrected  weighted  average  derivative  will  be  v^-consistent  with  the 
asymptotic  variance     Var(0(z)).      If     r  =  6     then  the  estimator  would  still  be 
v^-consistent  but  the  leading  term  in  equation  (4.15)  would  not  dominate,   so  that  the 
limiting  distribution  would  not  be  the  usual  one  for  a  semiparametric  estimator.      In 
comparison  with  Powell,   Stock,    and  Stoker  (1989)  this  result  imposes  weaker  conditions  on 
the  smoothness  of  the  density  at  the  expense  of  requiring  some  smoothness  of  the 
regression     E[v|x]. 

The  use  of  a  twicing  kernel  can  lead  to  an  improvement  even  when  the  density  is  not 
differentiable.     To  illustrate,   consider  the  case  where     A  =  0     and     a(x)  =  E[y|x]f   (x) 
and     b(x)  =  £(x)  =  E[w|x]f   (x)     satisfy  a  Holder  condition: 

Assumption  8:     There  are     P,   t  >  0,     C  (x),     and     C,  (x)     such  that     JC  (x)C,  (x)dx  <  oo 

a  b  a         b 

and  for  all      llx-xll      small  enough,      |a(x)-a(x)|    ^  C  (x)llx-xll        and      |b(x)-b(x)|    s 

C(x)  llx-xll^. 
b 

With  this  condition  replacing  Assumption  6  the  following  result  holds. 

Theorem  4.2:     If  Assumptions     2,  5,  7,  and     8     are  satisfied  then  as     n  — >  oo     and     h  — >  0 

MSE(^  )  =  Var[ip(z)]/n  +  0(n~^h~^)  +  0(h^^^^^). 

The  order  of  the  bias  is  again  the  product  of  pointwise  bias  orders     h       and     h  .     If  the 
bandwidth  is  chosen  so  that     n     h         and     h  ,   are  asymptotically  proportional  then 

the  estimator  Vn-consistent  if  ^  +  t  a  r/2.  This  result  is  similar  to  the  condition  in 
equation  (4.12)  for  the  differentiable  case,  but  only  requires  a  Holder  condition  rather 
than  derivatives.     This  result  shows  that  the  bias  complementarity  available  with  the 
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twicing  kernel  is  not  solely  an  artifact  of  the  higher  order  property  of  the  twicing 
kernel. 

We  can  illustrate  the  results  we  have  derived  so  far  by  comparing  different  average 

2 

density  estimators.      If  we  regard     jiiF)  =  J"f(z)   dz     as  a  nonlinear  functional  and  derive 

a  bias  correction  from  the  influence  function  estimate     5(z)  =  2f(2),     we  obtain  the 
estimator  of  equation  (2.6),     /i    =  2^._  f(z.)/n  -  Jf(z)   dz.      Plugging  in  a  bootstrap 
corrected  density  estimator     f(z)     gives 

~      2 
II     =  J"f(z)  dz, 

2 

that  will  generally  be  different  from     fj  .     Regarding     SfAz)   dz     as  the  expectation  of 

the  density,   where     (u     =  E[g(z,F   )]     for     g(z,F)  =  f(z),     and  noting  that     E[g{z,F)]     has 
influence  function     ff^(z)     that  can  be  estimated  by     f(z),     the  bias  corrected  estimator 
of  equation  (4.3)  becomes 

i^T  =  I-",nz-)/n  +  J"f(z)(P-F)(dz)  =  2jf(z)P(dz)  -  Jf(z)^dz  =  ji,. 
3        ^1=1       1  1 


Also,   the  bias  corrected  estimator  of  equation  (4.4)  will  be 
M4  =  L^Ji^-Vn. 


It  follows  from  equation  (4.8)  that     /i     =  /i      if     f     and     f     are  kernel  estimators  where 
f     is  based  on  the  twicing  kernel  corresponding  to     f.     Here  we  see  that  three  of  the 

bias  corrected  estimators  are  equal,   with  the  different  one  being  the  nonlinear  integral 

2 
J"f(z)  dz     of  a  bias  corrected  nonparametric  estimator.     The  conclusion  of  Theorem  2.2 

gives  ^^-consistency  of     fi       if  the  original  kernel  has  order  at  least     s,     ff^(2;)     is     s 

times  differentiable,     s  >  r/2,     and     \^    — >  oo,     Vnh       -^  0. 

A  further  reduction  in  MSE  of  the  bias  corrected  kernel  estimator     fi      is  possible 

by  deleting  the  own  observation  to  obtain 
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(4.16)  IJ-     =  L    .K^(z.-z.)/[n(n-l)]. 

Applying  Theorem  4.1,   this  estimator  will  be  \/n-consistent  if     s  a  r/4,     and  the 
bandwidth  is  chosen  as  in  equation  (4.14  .      Furthermore,   by  Theorem  4.2,   if     a(x)  =  b(x)  = 
f   (z)     satisfies  Assumption  9  then     /u       will  still  be  Vn-consistent  if     (^+t)/2  =  ^  ^  r/4 

and  the  other  conditions  described  above  are  satisfied.     For  example,   if     r  =  1  then  for 

-2  -r  46       ~ 

the  bandwidth  is  chosen  so  that     n     h         is  proportional  to     h     ,     fi       is  \^-consistent 

if     6  2:  1/4.     Here     ^  =  1/4     is  a  knife  edge  case  where  the  variance  remainder  term  will 

be  the  same  size  as  the  leading     Var(i/((z))/n     term.     The  condition     ^  ^  1/4     was  shown  by 

Bickel  and  Ritov  (1988)  to  be  necessary  for  existence  of  a  /n-consistent  (regular) 

estimator.     Thus,   the  twicing  kernel  estimator     fi       attains  ^^-consistency  under  minimal 

conditions,    like  the  more  complicated  sample  splitting  estimator  of  Bickel  and  Ritov 

(1988). 

To  illustrate  the  potential  efficiency  gains  from  using  twicing  kernels  we  have  done 

some  exact  MSE  comparisons  for  different  estimators  of  the  average  density  when  the  true 

density  is  a  standard  normal  and  a  standard  normal  kernel  is  used  to  construct  the 

estimator.      Specifically,   we  have  calculated  exact  MSE  for  the  estimator  in  equation 

(4.16)  when     z.     are  i.i.d.   with     N(0,I)     distribution  and     K(u)     is  either  the  standard 
1 

normal  density  or  the  twicing  kernel  based  on  the  standard  normal.     The  bandwidths  have 
been  chosen  to  minimize  the  asymptotic  MSE  for  the  respective  estimators  as  in  equation 
(4.14)  for  the  twicing  kernel  and  the  analogous  equation  for  the  standard  normal  kernel 
(with  corresponding  larger  bias  term). 

Table  One  gives  the  sample  sizes  at  which  the  MSE  for  the  twicing  kernel  estimator 
becomes  smaller  than  the  MSE  of  the  normal  kernel. 
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Table  One:   MSE    Crossover 
Dimension       Sample  size 


1 

18 

2 

17 

3 

19 

4 

21 

5 

25 

6 

31 

7 

39 

8 

52 

Figure  One  graphs  the  ratio  of  MSE  as  a  function  of  sample  size  for  dimension     r  =  1     up 
to  dimension     r  =  4.      For  dimension     r  >  4     the  standard  normal  kernel  will  not  be 
v^-consistent  but  the  twicing  kernel  estimator  will,   up  to  dimension     r  =  8.      We 
restricted  attention  to  the     r  :s  4     case  because  the  MSE  ratio  would  asymptote  to  zero 
for  higher  dimensions.     These  graphs  show  persistent  MSE  gains. 

Figure  Two  presents  graphs  of  the  MSE  as  a  function  of  sample  size  for  dimension     r 
=  1     and     r  =  3     and  for  sample  sizes     50,   100,   and  200.     These  graphs  allow  us  to 
compare  the  sensitivity  of  the  MSE  to  the  choice  of  bandwidth  for  the  standard  normal  and 
twicing  kernel.     Although  the  MSE  of  the  twicing  kernel  can  turn  sharply  upward  at  low 
bandwidths,  we  find  that  it  is  flat,   and  close  to  its  minimum,  over  a  wide  range  of 
bandwidths.      In  this  sense  the  MSE  for  the  twicing  kernel  seems  less  sensitive  to  the 
choice  of  bandwidth  than  the  original  kernel. 


5.        Semiparametric  M-estimation 

Estimators  that  solve  an  estimating  equation  with  a  nonparametric  component  have 
many  interesting  applications  and  include  as  special  cases  all  the  ones  we  have 
considered  so  far.     This  class  also  includes  profile  likelihood  estimators  and  many 
others,  e.g.   see  Bickel  et.   al.   (1990).     We  refer  to  this  class  of  estimators  as 
semiparametric  m-estimators.     In  this  Section  we  develop  bias-corrected  versions  of  these 
estimators. 
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To  describe  a  semiparametric  m-estimator,   let     /3     denote  a     q  x   1     parameter  vector, 
F     and     F     be  as  previously  discussed,   and     m(z,/3,F)     a     q  x   1  vector  of  functions.     Let 
P     solve 


(5. 


1)  X:-"|m(z.,|3,F)/n  =  0. 


This  includes  as  special  cases     fi(F),     where     m(z,/3,F)  =  /3  -  fi(F),     and     X!-_ig^z.,F)/n, 
where     m(z,p,F)  =  /3  -  g(z,F). 

To  obtain  bias  corrections  it  is  useful  to  begin  by  expanding  the  moment  equation 
around  the  true  value  of  the  parameter     /3   .     Suppose  that     m(z,/3,F)     is  differentiable  in 
/3.     Then  expanding  and  solving  for     JS     gives 

(5.2)  V^(P-/3q)  =  -M"^Xi"^m(z.,F)/-/H,    M  =  n"^^.2^3m(z.,p,F)/5/3,   m(z.,F)   =  m(z.,/3Q,F), 

where     /3     is  a  mean-value  that  lies  on  a  line  joining     /3     and     /3        and  actually  differs 
from  element  to  element  of     m.     Asymptotic  normality  and  Vn-consistency  of     |3     will 
follow  from  consistency  of     M     for     M  =  E[Sm(z,/3   ,F   )/5/3],     nonsingularity  of     M,     and 
asymptotic  normality  of     J]._  m(z.,F)/Vn'.     The  consistency  of     M     is  a  straightforward 
property  that  is  generally  an  implication  of  consistency  of     p     and     F     and  a  uniform  law 
of  large  numbers.     Asymptotic  normality  of     ^._  m(z.,F)/v'n     is  where  undersmoothing  and 
bias  corrections  may  be  important. 

Bias  corrections  for     /3     can  be  obtained  by  applying  the  analysis  of  Section  4  to 
g(z,F)  =  m(z,F)     and  accounting  for  the  Jacobian  term.     The  first  approach  is  by  an 
influence  function  correction  like  that  of  equation  (4.3).     Let     6(z)     be  the  influence 
function  for     fi(F)  =  J'm(z,F)F   (dz),     and  let     6(z)     be  an  estimator  of  this  influence 

^  ~1      n  ^    y^ 

function.     Also,   let     M  =  n    i^.     dm{.z.,f3,F)/dfB.     Then  a  bias  corrected  estimator  is  given 
by 

(5.3)  P  =  P  -  M"^[i;.y(z.)/n  -  J-6(z)F(dz)]  =  p  -  M~V6(z)(P-F)(dz) 
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This  estimator  has  a  form  like  that  in  equation  (4.3),   except  that  the  Jacobian  estimator 
is  included  to  account  for  the  presence  of     M     in  equation  (5.2). 

To  see  how  the  correction  affects  the  estimator,   let     i//(z)  =  m(z,F    )  +  6(z)-E[5(2)] 
be  the  influence  function  of     Jm(z,F)F(dz).     Then 

(5.4)  v/S(p  -  Pq)  =  -m"^[I."j^(z.)/v^  +  ^(Rj^  +  R^^  +  R3^)]  +  R^^ 

R,      =  J[m(z,F)-m(z,F-)](P-F_)(dz),     R„     =  J'[6(z)-6(z)](P-F)(dz). 
In  u  u  zn 

R^     =  J[m(z,F)-m(z,F^)]F^(dz)  -  j5(z)(F-F^)(dz).      R^      =  •/E'(M~^-M"-^)J'5(z)(P-F)(dz). 
Jn  u      u  u  4n 

The  remainder  terms  in  equation  (5.4)  are  all  second-order,   so  that  v^n-consistency  of     p 
should  not  require  undersmoothing,   although  precise  regularity  conditions  are  difficult 
to  specify  at  the  level  of  generality  considered  here. 

One  important  property  of  semiparametric  estimators  follows  immediately  from 
equation  (5.4).      Suppose  that     6(z)  =  0,     meaning  that  estimation  of     F     does  not  affect 
the  asymptotic  distribution  of     ^.     Then     6(z)  =  0     is  consistent,   and  the  bias  corrected 
estimator  would  be  the  original  estimator.     Consequently,  the  original  estimator  should 
not  need  undersmoothing.      Thus,   we  find  that  if  the  presence  of     F     does  not  affect  the 
large  sample  distribution  of     p     no  undersmoothing  will  be  needed. 

There  are  many  interesting  semiparametric  estimators  where  estimation  of     F     does  not 
affect  the  asymptotic  variance,   and  so  undersmoothing  may  not  be  needed.     As  shown  in 
Newey  (1994),   if     m(z,p,F)     is  the  gradient  of  a  function     q(z,P,F)     and     F     is  an 
estimator  of  the  "profile"  distribution  that  maximizes     E[q(z,p,F))     then  estimation  of 
F     does  not  affect  the  asymptotic  variance.     Hence,   undersmoothing  may  not  be  needed  for 
any  of  these  estimators.     Many  known  semiparametric  estimators  are  special  cases  of  this 
result,  including  Robinson  (1988),   Chen  and  Shiau  (1991),  and  Ichimura  (1993).     Also,   if 
m(z,p,F)     is  an  efficient  score  for     (3     in  a  semiparametric  model  and     F     is  an  estimator 
that  imposes  the  restrictions  implied  by  the  model  then  the  limiting  distribution  of     p 
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is  not  affected  by  estimation  of     F     when  the  model  is  true,   as  has  been  demonstrated  in 
many  examples  in  the  literature.      Hence,   undersmoothing  may  not  be  needed  when     m(z,/3,F) 
is  an  efficient  score  and  the  model  is  correct. 

The  second  approach  to  bias  correction  is  to  use  a  bootstrap  corrected  estimator     F 
in  the  formation  of  the  estimator  of     (i,     choosing     p     as  the  solution  to 

(5.5)  5:."^m(z.,/3,F)/n  =  0. 

This  estimator  should  not  need  undersmoothing  because  in  the  expansion  of  equation  (5.2) 
F     will  replace     F     in  the  average     ^._  m(z.,F)/n.     As  discussed  in  Section  3,   any 
estimator     F     that  is  an  idempotent  transformation  of  the  empirical  distribution,   such  as 
a  sieve  estimator  or  a  series  estimator  of  a  conditional  expectation,   has  this  correction 
built  in,   so  undersmoothing  may  not  be  required  for  Vn-consistency  of     p. 

We  could  derive  precise  results  on  bias-corrected  m-estimation  for  both  the 
estimators  of  equation  (5.2)   and  (5.5).      However,   because  the  ideas  here  are  basically 
the  same  as  in  Sections  2  and  3,   we  choose  to  be  brief,   and  consider  only  th^;  estimator 
of  equation  (5.5),   focusing  on  conditions  for  VTi-consistency  when     F     is  a  twicing  kernel 
estimator. 

Newey  and  McFadden  (1994)  have  already  given  general  results  on  \/n-consistency  of 

semiparametric  kernel  estimators.     We  build  on  their  results,  deriving  a  corresponding 

result  for  twicing  kernels,   showing  that  undersmoothing  is  not  needed.     We  adopt  their 

specification  for     m(z,|3,F),      where     m     depends  on     F     only  through     y{x)  =  E   [w|x]f(x), 

r 

where     w     is  a  vector  of  random  variables  that  are  not  elements  of     x.     A  twicing  kernel 
estimator  of  the  true  function     3'p,(x)  =  E[w|x]f   (x)     would  be 

iix)  =  Y.    ,K,  (x-x.)w./n. 
^1=1    h         1      1 

Letting     m(z,p,2')     be  a     q  x  1     vector  of  functions  that  depends  on     /3     and  the  function 
■y(-),     a  kernel  semiparametric  bootstrap  corrected  m-estimator  would  be     p     solving 
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0  =  Ej^^m(z.,/3,J)/n. 

To  specify  regularity  conditions  we  modify  the  norm     II  •  II     to  apply  to     '^.      Let     X     denote 
a  compact  set  and 

llyll   =  sup      „sup.     .     ,115  ^(x)ll. 

This  is  a  Sobolev  norm  like  the  one  used  for  the  kernel  results  of  Section  2  and  3. 

Assumption  9:      j-tx)     is  continuously  differentiable  to  order     s     with  bounded  derivatives 
and  for  some     c  >  0     and  all     A     with      |A|    =  s,     Jsup  |5   y   (x+A)|dx  <  oo. 


Let     m(2,y)  =  m(z,/3   .^r). 

Assumption  10:     There  are     b(z)  and     DCz.g-)     that  are  linear  in     y     such  that  for  all     jr 

2 

with     ll3--?-„ll      small  enough,      llm(z,r)-m(z,3'    )-D(z,y-2'_)ll   :£  b{z)ll3--'3r    II    ,      IID(z,2')ll   £ 

b(z)ll3'll,     and     E[b(z)]  <  oo.     Also,   there  is  a  matrix     i^lx)     with     E[D(z,3'-3'   )]  = 
J'i/'(x){3r-2r   )(dx),     where     l'(x)     is  continuously  differentiable  of  order     t     with 

bounded  derivatives.      Also,     for     p  >  2,     E[llwll    ]  <  oo     and     h  =  h(n)     with     h  — >  0     and 

l-(2/p),r.,    ,   , 
n  h  /ln(n)  — >  oo. 


Let     D(x)  =  JK(u)y(x+hu)du,     p     =  ln(n)       /■/nh  +  h  , 

B     =  ■/nJ'[y(x)-y(x)]wP(dz),     R     =  •/nJ"[m{z,F)-m(z,F^)-D(z,F-F-)]F-(dz), 
n  n  0  0      0 

S     =  •/Sj[m(z,F)-m(z,F^)](P-F-)(dz). 
n  0  0 


We  first  give  a  result  that  bounds  the  remainder  terms  in  the  expansion  for  the  average 
of     m(z,f)- 
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Theorem  5.1:     If  Assumptions  2,  9,  and  10  are  satisfied  then  for     {f)(z)  = 
m(z,Tf  )  +   i'(x)w  -  E[v(x)w], 

Y.'^Mz.,i)/^  =  Y^Mz.)/VR  +  B     +  S     +  R  , 
'-'1=1        I  ^1=1       I  n         n  n 

B    =  0  (VTlh^^^  +  h^),    s    =  0  (VRp^  +  h^),    R    =  0  (VRp^). 

n         p  n         p         n  n         p         n 

Furthermore,   if     v(x)  =  0     then  this  result  also  holds  with     K(u)     replacing     K(u). 

To  show  a  corresponding  /n-consistency  result  for  the  semiparametric  estimator  we  need  to 
make  assumptions  that  ensure  convergence  of  the  Jacobian  term     M.     The  next  condition 
suffices. 

Assumption  11:     ^  — ^  fS   ,     for     /3     in  a  neighborhood     B     of     /3        and  all     y     with      Il3'-?'^ll 
small  enough,     m{z,l3,'y)     is  continuously  differentiable  and  for  some     c  >  0, 
liam(z,/3,9-)/ap-am(z,pQ,?-Q)/5pll   <  b(z)(llp-/3Qll^  +    117-^^^^),     M  =  E[dm{z,f3^,r^)/aii]     exists 
and  is  nonsingular,     ni(z,y    )     has  mean  zero  and  finite  second  moments. 

2  s+t 

Theorem  5.2:     If  Assumptions  2  and  9-11  are  satisfied,     Vnp     — >  0,     and     Vnh         — >  0 

then     Vn(^-(i  )  -^  N(0,M~^Var(\p(z))M~^' ).     Furthermore,   if     v(x)  =  0     then  the  same 

result  holds  with     K(u)     replacing     K(u). 

When     v[x)     is  nonzero  the  conditions  for  ^^-consistency  of  this  estimator  are  exactly 
analogous  to  those  discussed  following  Theorem  3.1.     In  particular,  undersmoothing  will 
not  be  needed  if     t  >  r/2  +  d.     The  case  where     v[x)  =  0     corresponds  to  estimation  of     -y 
having  no  effect  on  the  asymptotic  variance  of     p.     As  previously  discussed,  no  bias 
correction  is  needed  for  this  case:     If     Vnp     — >  0     and  the  kernel  is  the  original 
(non-twicing  one)  then  v^-consistency  follows  from  Theorem  5.2. 
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6.        Conclusion 

In  this  paper  we  have  shown  that  it  is  often  possible  to  use  a  nonparametric 
estimator  where  the  degree  of  smoothing  is  optimal   (MSE  minimizing)  for  estimation  of  the 
function  to  construct  a  v'n-consistent  estimator  of  a  functional.      One  approach  involves 
adding  a  bias  correction  to  the  estimator  where  the  nonparametric  estimate  is  plugged  in. 
Another  involves  a  smoothing  correction  to  the  nonparametric  estimator.      An  important 
class  of  nonparametric  estimators,   that  are  idempotent  transformations  of  the  empirical 
distribution,   have  this  smoothing  correction  built  in,   so  that  undersmoothing  is  not 
needed  for  /n-consistency,    including  orthogonal  series  density  estimators,   series 
estimators  of  conditional  expectations,   and  sieve  estimators. 

The  influence  function  plays  a  key  role  in  achieving  Vn'-consistency  without 
undersmoothing.      For  the  additive  correction  to  a  plug  in  estimator  it  must  be  possible  to 
construct  a  nonparametric  estimator  of  the  influence  function  that  converges  sufficiently 
fast.     For  the  smoothing  adjustment  to  the  nonparametric  estimator  the  influence  function 
must  be  smooth  as  a  function  of  the  data.     These  properties  may  or  may  not  be  intrinsic 
to  the  functional.      A  topic  of  future  research  would  be  to  seek  to  identify  the  class  of 
functionals  for  which  Vn-consistent  estimation  without  undersmoothing  is  possible. 

We  have  discussed  bias  correction  and  undersmoothing  for  an  increasingly  general 
class  of  estimators,  beginning  with  functionals  of  a  density  and  ending  with 
semiparametric  m-estimators.     We  found  that  for  semiparametric  m-estimators 
undersmoothing  is  not  needed  when  nonparametric  estimation  does  not  affect  the  asymptotic 
variance  of  the  estimator.     Regularity  conditions  were  given  for  "twicing"  kernel 
estimators.     We  showed  that  this  class  of  kernel  estimators  has  a  special  property,  being 
the  outcome  of  a  smoothing  correction  applied  to  the  original  kernel,  that  removes  the 
necessity  of  undersmoothing  for  these  estimators.     Our  numerical  results  show  MSE  gains 
for  the  twicing  kernel  at  small  sample  sizes  and  less  sensitivity  to  bandwidth  choice  in 
estimation  of  the  average  density,   indicating  some  potential  for  use  in  practice. 


-  41  - 


Appendix:  Proofs  of  Theorems 

Throughout  the  Appendix,     c     and     C     will  represent  a  generic  positive  constants, 
that  may  be  different  in  different  uses. 

Proof  of  Theorem  2.1:      By  Lemma  8.10  of  Newey  and  McFadden  (1994),   it  follows  that 

IIF-F^II^  =  0   (ln(n)/nh'^"^^^  +  h^^).     Therefore,   for     R       in  equation  (2.4),     R     = 
Op  n  ^  n 

O(VnllF-F^II^)  =  0   (ln(n)/-/Hh'^''^*^  +  V^h^^).      Also,   for     B     =  y.",e(z.,h)/v^     for     e(z,h)  = 
Op  n       ^1=1       1 

jK(u)[6(2+hu)-6(z)]du.     By  continuity  of     6(z),     5(z+hu)-5(z)  -^  0     for  almost  all     u     as 
h  — >  0.      Also,   for  small  enough     h,     by     K(u)     having  bounded  support     U, 
sup    |K(u)[5(z+hu)-5(z)]|    <  l(uel/)sup    |K(u)l2sup  |5(z+A)|    =  b(z,u),     that  is  finite 

with  probability  one  by     5(z)     bounded     and  has  finite  integral  over     u     by     K 

compact,   so  by  the  dominated  convergence  theorem     e(z,h)  — >  0     as     h  — >  0     with 

2  2 

probability  one.      Also,      |e(z,h)|      ^  [Jb(z,u)du]     £  C,     so  by  the  dominated  convergence 

2 
theorem     Var(B   )  =  Var(e(z,h}}  ^  E[e(z,h)    ]  — ^  0.      Also,   by  the  usual  mean-value 
n 

expansion, 

(A.l)  jK(u)[fQ(z-hu)-fQ(z)]du 

=  JK(u){E^~}((-h)-J/j!)E,^,     .uVf^(x)  +  {{-hf/s\)l,^,      uVf^(x-hu)Mu 

=  (-h)^X  I  ^  I  ^/K(u)u'^a\Q(z-hu)du/s!, 

where  I  h  I  £  |  h  |   and  dependence  of  h  on  z  and  u  is  suppressed  for  notational 
convenience.  Then  for  small  enough  h, 

|E[e(z,h)]|  =  |J'jK(u)[5(z+hu)-5(z)]dufQ(z)dz|  ^  J|5(z)  |  |  J-K(u)[fQ(z-hu)-fQ(z)]du|dz 

^  Ch^I|;^,=s[^|K(u}|u^du]JsuP||^„^^la\Q(z+A)|dz  =  O(h^). 

Then  by  the  Markov  inequality,     B     =  0  (V^     +  o(l)),     giving  the  first  conclusion. 

Furthermore,  under  the  stated  bandwidth  conditions  both     B     -^  0     and     R     -^  0,     giving 

n  n 
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the  second  conclusion.      QED. 


The  following  Lemma  is  useful   in  the  proofs  to  follow.     Let     w     denote  a  random 
variable,     g(z)  =  E[w|z],      and     ^-(z)     denote  a  possible  value  of     g(z)f   (z). 

Lemma  A.l:     If  Assumption  2  is  satisfied,     g(z)     and     f^(z)     are  continuous,     iJ-C^f)     is 
linear,      \n(r)\    £  Cllj-ll,     then  for     r( ' )  =^  E[wK^('-z)],     E[m(wK^( --z))]  =  m(^). 


Proof:     By     K(u)     having  bounded  support  and     Z     compact  there  is  a  compact  set     ^     such 

that     K,  (z-u)  =  0     for  all     z  e  Z     and     u  i  ^.     Let     y,(z)  =  J„3r(u)K,  (z-u)du     and 
h  i  G  n 

r„(z)  =  S     3-(u)K,  (z-u)du.      For     z  e  Z,     2'„(z)  =  0,     so  that      llj-^ll   =  0     and      |m(3r„)|    ^ 
2  joc  n  Z.  i.  Z 

Clly    II   =  0.     Then  by  linearity  of     m(3^),     m('j')  =  m(y  +2^   )  =  m(^  ).      Also,   by      IIK   (--zlH   = 

0     for  all     z  ^  i?,     m(3^(z)K   (--z))  =  0,     for  all     z  g  S,     so  by  linearity  of     m.{is), 

E[m(wK^(--z))]  =  E[wm(K,  (--z))]  =  E[g(z)m(K,  ( --z))]  =  E[m(g(z)K,  ( --z))]   = 
h  h  h  h 

J'„m(3'(u)K,  ('-uljdu.     Let     u,     be  a  sequence  of  measures  with  finite  support,   that 
b  h  J 

converge  in  distribution  to  the  uniform  measure  on     if.     Then,   by  continuity  of     ^-Cz)     and 

continuous  differentiability  of     K(v)     to  order     d     and  by     Z     compact,      9'(z)K  ('-z)     is 

continuous  in     z     in  the  semi-norm     ll»ll.     Hence     m(3r(z)K,  ('-z))     is  continuous  and 

h 

bounded  in     z     on     )?.      It  follows  that     J'„m(3'(z)K,  (--zjjdu,  — >  J~„m(y(z)K,  ( •-z))du.     Also, 

G  h  J  iS^  n 

Also,   since  each  derivative  of     y(u)K,  (z-u)     with  respect  to     z     of  up 

to  order     d     is  bounded  and  continuous  on     S,      it  follows  that      IIJ'„'3r(u)K.  ('-uldCu  -u)ll  — > 

is  h  J 

0,     and  hence     m(J'„3r(u)K,  (•-u)du  J  — >  m(X„3'(u)K.  (•-u)du).     Furthermore,  by     u,     having  finite 
BhJGh  J 

support  and  linearity  of     mC^-),     m(J"„9'{u)K  ('-ujdu  )  =  J"m(3'(u)K  (•-u))du  .     Then  by  the 

G  h  J  h  J 

triangle  inequality, 

m(r)  =  m(J„r(u)K,  (--ujdu)  =  J„m(3'(u)K,  (--ujldu  =  E[m(wK.  (--z))].     QED. 
6  h  G  h  n 


Proof  of  Theorem  2.2:      For     T       from  Section  2, 

n 
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(A.2)  It    I    s  VES\Siz)-5{z)\\hz)-rAz)\dz 

n  O 

£  v/Kj-blzJlfCzj-fQCzJIdzllF-FQll   +  v^JI  D(z,F-Fq)  |  |  f(z)-fQ(z)  |dz 

<  ZV^JblzJdzllF-F^II      =  0  (ln(n)/V^h  +  /Kh     ). 

0  p 

Next,   by     b{z)     bounded,     E(b(z)]  <  m.      Also,   by  Assumptions  1  and  3,    |D(z,K,  ('-z))  |    :£ 

-r-d  _  »  _ 

b{z)IIK  ('-z)!!   ^  Cb{z)h         .      Let     f(z)  =  E[f(z)]     with  corresponding  charge     F,     and  note 

that     D(z,F-F    )  =  D(z,F-F)  +  D(z,F-F    ).      Also,   by  Lemma  8.10  of  Newey  and  McFadden 

(1994),      IIF-F    II   =  0{h^).     Then  by  Chebyshev's  inequality, 

VnjD(z,F-F^)(P-F„)(dz)  =  0   ({E[D(z,F-F-)^]}^^^)  =  0   (h^). 
0  0  p  0  p 

Now  let     k{z,u)   =  D(z,K^('-u)),     kAz)  =  J'«:(z,u)r  (u)dz,     and     kjz)  =  J^(u,z)f^(u)dz. 

hi  0  Z  0 

By  Lemma  A.  1,     ^^(z)  =  D(z.,F).     Also,     E[  |ft(z,z)  |  ]  ^  Ch""^"^     and     {E[^(z,z)^]>^'^^  £ 

-r-d 
Ch  ,      so  by  linearity  of     D(z,F)     and  a  V-statistic  projection  result  like  Lemma  8.4 

of  Newey  and  McFadden   (1994), 

(A.3)  /SjD(z,F-F)(P-FQ){dz) 

-  v/K{Jfc(z,u)(PxP)(dz,du)  -  SlkAz)+kJz)]P(dz}  +  E[^,(z)]}  =  0  (n'^^V''"'^). 

1  Z  1  p 

The  triangle  inequality  and  linearity  of     D(z,F)     in     F     then  give 
V^J'D(z,F-FQ)(P-FQ)(dz)     =  0  {n~^'^\~^~^  +  h^).     Then  by  Assumption  3,   for     h  ^  0, 

(A.4)  IS    I    5  •/n|J'D(z,F-F^)(P-F^)(dz)|    +  [J"b(z)(P+F-)(dz)WllF-F-ll^ 

n  0  0  0  0 

^'  ,   -1/2, -r-Zd       .  s        /-H  2s, 
=  O  (n        h  +  h     +  vnn     ). 

P 

The  conclusion  then  follows  by  equation  (A.Z)  and  the  triangle  inequality.     QED. 
Proof  of  Theorem  3.1:     For     c     in  the  statement  of  the  Theorem,  let     d-(z)  = 
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^^PiiAii        i-vi_    1^  f^(z+A)|.      By  a  mean-value  expansion  like  that  in  the  proof  of  Theorem  1 
and  boundedness  of  the     t         derivatives  of     5(z)     it  follows  that 

IX[5(z+hu)-5(z)]K(u)du|    ^  Ch       for  small  enough     h.      Also,   applying  the  same  argument 
with     K     replacing     K,     for     6     and     f     in  equation  (3.5),      |6(z)-5(z)|    :£  Ch       and 
|f(z)-fQ(z)|    £  Ch^d^(z).     Therefore,     Var(B^)  £  E[  |X[5(z+hu)-5(z)]K(u)du  |  ^]  =  0(h^^) 
and,   as  in  equation  (3.5),   and      |E[B^]|    ^  Vn'Ch^'^^Jd^(z)dz  =  0(h^"^^).     The  first 
conclusion  then  follows  by  the  Markov  inequality.     Also,   it  is  easy  to  check  that     K(u) 


0 


also  satisfies  Assumption  2,   so  that  by  Lemma  8.10  of  Newey  and  McFadden  (1994),    IIF-F 
=  0   (ln(n)/nh  +  h     ).      The  second  conclusion  then  follows  from  Assumption  1.      QED. 

Proof  of  Theorem  3.2:      Equation  (2.4)   is  satisfied  for     F     having  Radon-Nikodym 

derivative     f(z)  =  p"^(z)'a     and     B     =  y'.",[5(z.)-5(z.)]/V^.     Then       Var(B   )  £ 

n       ^1=1         1  1  n 

E[{6(z.)-5(z.)}^]  £  Cj[5(z)-5(z)]^dz  =  0(.f^^^^)     by  the  density  bounded  and  the 
11 

approximation  order  given  in  the  statement  of  the  Theorem.     Also,  by  the  Cauchy-Schwartz 
inequality,      |E[B^]|    <  v^[J'{5(z)-5(z))^dz]^^^[J{f (z)-fQ(z)}^dz]^^^  =  0(v^j"^''''"'^'^^)  -^  0. 
The  first  conclusion  then  follows  by  the  Markov  inequality.     The  second  conclusion 
follows  by  Assumption  1  and  the  triangle  inequality.     QED. 


Proof  of  Theorem  4.1:   Note  that 


(A.5)  M     =  XI-^-^(z-.Z-)/[n(n-l)],     fc(z.,2.)  =  s'^K,  (x.-x.)y .v., 

f^c       ^^\*j       i'   J  i'  J  h     1     J  ■'j  i' 


where  we  suppress  dependence  of     fc     on     h     for  notational  convenience.     Define     a(x)  = 
J'K(u)a(x+hu)du     and     b(x)  =  J'K(u)b(x+hu)du.     Taking  expectations,  and  integrating  by 
parts,  it  follows  similarly  to  equation  (3.5)  that  for     |j     =  Ja(x)b(x)dx, 

(A.6)  E[m^]  -  fiQ  =  E[^(z^.Z2)]  -  Mq  =  S{S8^K^{x^-x^)Ely\x^]r^{x^)dx^}hU^)dx^  -  Mq 

=  JjKj^(x^-X2)a(x2)b(x^)dx^dx2  "  Mq  =  -J[a(x)-a(x)][b(x)-b(x)]dx. 
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By  a  mean  value  expansion  in     h     like  equation  (A.l), 

a(x)-a(x)  =  (hVsDV,^  ,_  jK(u)u'^S'^a(x+hu)du,      |h|    ^    |h|. 

I  A  I  -S 

For     c     in  the  statement  of  the  theorem  let     d  (x)  =7,,,      sup,,,,,      \d   a(x+A)|.      By     K(u) 

having  bounded  support     U,     for  small  enough     h,      lK(u)r.     ._  u  d  a(x+hu)|    s  CKuelild  (x) 

I  A  I  — s  a 

<  00.     Then  by  the  dominated  convergence  theorem,     JKluju  5  a(x+hu)du  — >  C-v^   a(x),   so  that 

A 

h~^[a(x)-a(x)]  -^  I,^,      C-.5'^a(x)/s!.     Similarly,     h"^b(x)-b(x)]  -4  E,  ^  ,  _,C^5'^b(x)/t!. 

I  A  I  — S    A  I  A  I  — t    A 

Also,   noting  that      |h~^[a(x)-a(x)]  |    £  Cd  (x)     and      1  h~^[b(x)-b(x)]  |    <  C     for     Jd  (x)dx  <  oo 

a  a 

-s+t 
by  Assumption  6,   the  dominated  convergence  theorem  gives     h         (E[fi]  -  fi   )  = 

h^^^Jlalxl-aCxlliWxj-bCxlldx  -^  B,      so  that 

(A.7)  E[il  ]  =  Mn  +  h^^B  +  o(h^^). 

c  0 


Next,   note  that     u     is  a  U-statistic  with  kernel     [k{z.,z  .)+k{z  .,z.)]/2.     Then  by 

1    J  J     1 

Serfling  (1980), 


(A. 8)  Var(fx^)  =  [(n-2)/n(n-l)]Var(E[fc(z^,Z2)+fc(z2,z^)  |z^]) 


+  [l/n(n-l)]{Var(ft(z^,Z2))  +  Cov(k{z^,z^],k{z^,z^])} 


By  Assumptions  2  and  5  and  another  application  of  the  dominated  convergence  theorem, 

J'J[S'^K(u)]^M     (x-hu)/j       (x)dudx  ^  Q,   =  J[5'^K(u)]^duJ/j     (x)/j       (x)dx.     Then  by  a  change  of 
yy  WW  1  ^yy      "^ww  -^  ^ 

variables,     u  =  (x  -x  )/h,     x  =  x  , 

(A.9)  E[Wz^,Z2)^]  =  JJ[5^K^(x^-X2)]^E[y2lx2]E[v^|x^]fQ(x2)fQ(x^)dx2dx^ 

=  h~^~^^^^ ssid^uu)]^  (x-hu)M    (xMudx  =  h'^^'^'^'o,  +  om"''"^'^'). 

yy  WW  1 

Also,  since  E[A:(z  ,z  )]  converges,  E[Wz  ,z  )]  =  o(h       ),  so 
VarCftCz^.z^))  =  E[k{z^,z^)h   +  oCh"^"^'*^' ).  Therefore, 
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(A.IO)  [l/n(n-l)]Var(ft(z^,Z2))  =  [l/n(n-l)]h  '^  ^'^'q^  +  o([l/n(n-l)]h  ^  ^''^'). 

-2^-r-2|Al  ,   -2,-r-2|A|, 

=  n     h  Q     +  o(n     h  ) 

Also,   by     a'^K(-u)   =  (-1) ''^ '  a'^R(u),      it  follows  similarly  to  eq.    (A. 9)  that 


Therefore,   analogously  to  equation  (A.IO)   it  follows  that 

(A.ll)  ll/n{n-l)]Cov{k{z^,z^),kU2,z^))  =  nA"'^"^' ^  '  (Q-Q^)  +  otn^V"""^' ^  ' ). 

Next,   let     lix]  =  jK(u)£{x+hu)du     and     a(x)  =  XK(u)a(x+hu)du.     By  applying  the  dominated 
convergence  theorem  as  we  have  done  previously  it  follows  from  Assumption  7  that     2(x)  - 

Ux)     and     a(x)  — >  a(x)     as     h  — >  0,     so  that     a(x)v  +  2(x)y  — >  a(x)v  +  £(x)y       as 

~  2  2     2 

h  — >  0.     Since     [a(x)v  +  £(x)y]     ^  C(v  +y  ),     Assumption  7  and  the  dominated  convergence 

theorem  imply  that     Var(a(x)v+?(x)y)  — >  Var(i//(z)).     Furthermore,  by  integration  by  parts 

and  interchanging  the  order  of  differentiation  and  integration, 

Emz^,z^)+k{z^,z^)\z^]  =  E[S'^K^(x^-X2)y2V^|z^]  +  ElS'^K^Cx^-x^ly^v^lz^] 

=  E[S'^K,  (x -x„)E[y„|x„]|z,]v,   +  E[a'^K.  (x„-x,)E[v„  |x„]  Izjy, 
hl2         22       11  h21         22       11 

=  [ja'^Kj^(x^-x)E(y|x]fQ(x)dx]v^  +  (-1) ''^ '  ja'^Kj^(x^-x)b(x)dx]y^  =  a(x)v  +  2(x)y. 

Therefore, 

(A.12)  [(n-2)/n(n-l)]Var(E[fe(z^,Z2)+^(z2,z^)|z^])  =  Var(i/>(z))/n  +  o(n"^). 


Plugging  the  results  from  (A.10)-(A.12)  into  {A.8)  and  using     MSE(jn  )  =  Var(fx  )  + 
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2 
(E[/j  ]-Mf^5       to  combine  equations  (A. 7)  and  (A. 8)  and  then  gives  the  result.      QED. 

Proof  of  Theorem  4.2:      By  Assumption  8  and     K(u)     having  bounded  support,     hu     can  be 

made  as  small  as  desired  uniformly  over  the  support  of     u     by  choosing     h     small  enough. 

Then  for     h     small  enough,      |a(x)-a(x)|    =    |  JK(u)[a(x+hu)-a(x)]du|    £ 

J|K(u)|  |a(x+hu)-a(x)|du  ^  J|K(u)|C  (x)llhull^du  =  h^C  (x)/]  K(u)  |  llull^du  =  Ch^C  (x)     and 

similarly,      |b(x)-b(x)|    s  Ch  C  (x).      It  then  follows  by  equation  (A. 6)  that 

a 

lEfjl  ]-/i^|    ^  T|i(x)-a(x)|  |b(x)-b(x)|dx  <  Ch^^Vc  (x)Q(x)dx. 
c       0  a         b 

The  conclusion  now  follows  from  plugging  (A.10)-(A.12)   into  equation  (A. 8),   similarly  to 
the  proof  of  Theorem  4.1.     QED. 

Proof  of  Theorem  5.1:      By  a  change  of  variables, 


Jy(u)r(u)du  =  JJy(u)wK^^(u-x)duP(dz)  =  J"[Ji^(x+hu)K(u)du]wP(dz)  =  Jy(x)wP(dz). 

h 


Then  by  Assumption  9, 


J"D(z,y-3-   )F   (dz)  =  Ji^(u)[9^(u)-3-   (u)]du  =  J'i^(u)3'(u)du  -  E[v(x)w] 


J"i}(x)wP(dz)  -  E[y(x)w]  =  B     +  JvMMP-FJidz). 

n  0 


The  decomposition  in  the  statement  of  the  Theorem  then  follows.     Next,  by  Lemma  8.10  of 

Newey  and  McFadden  (1994),      llf-y„ll   =  0   (p   ).     Let     S     =  JD(z,F-F„)(P-F„)(dz).      By 

Opn  n  00 

Markov's  inequality,     Jb(z)(P+F   )(dz)  =  0  (1),     so  by  Assumption  9, 

(A. 12)  V^IIS  -S    II   <  [Jb(z)(P+F„)(dz)Wllf-roll^  =  O  iVRph. 

n     n  0  "    °0  p         n 


Let     -yix)  =  ElyCx)]  =  J"K  (x-u)^'   (u)du  =  JKCu)?-   (x-hu)du.     Then  by  linearity  of     D{z,y) 
in     r,     D(z,J-3f^)  =  D(z,3-?^)  +  D(z,i^-ar   ).     Let     k{z.,x.)  =  D(z.,w.K  (--x.)),     I  (z)  = 
J"fc(u,z)FQ(du)  =  J'D{u,wKj^(--x))FQ(du),     and     kAz)  =  /^(z.ujdF^Cdu).     By  Lemma  A.l,     k^iz) 
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=  D(z,^).     Then  by  linearity  of     Dtz.j-)     in     i(,     it  follows  as  in  equation  (A. 3)  that 

V^JD(z,y-r)(P-F^)(dz)  =  0   (h~'~~^/v/n).      Also,     v^JD(z,^-r^)(P-F^)(dz)  -  0   (h^),      so  by  the 
Op  0  0  p 

triangle  inequality,      S     =  JD(z,F-F_)(P-F^)(dz)  =  0   [h'^'^^/Vn  +  h^).     Then  by  the 

n,  00  p  -^ 

triangle  inequality  and     h         /■/n  ^  h  /Vn     for     n     large  enough, 

(A.  14)  S     =  O  (/np^)  +  O   [h^'^^/yfR  +  h^)  =  0  [yfRp^  +  h^). 


Next, 

(A.15)  |R    I    s  VnS\m{z,z)-m{z,rJ-Diz,r-7^]\Fldz)  ^  E[b(z)]^/nllT-y^ll^  =  0  iVnp^). 

n  0  0       0  Op         n 

Next,   let     vix)   =  JK(u)y(x+hu)du     and     ^-(x)   =  J'Ktulj'Cx+huldu.     Then  it  follows  as  in 
equation  (3.5)  that 

E[B  ]  =  VnEl{vU)-v(x)}w]  =  V^J'[i5(x)-i^{x)]t   (x)dx  =  ■/nJ"[y(x)-i^(x)]Wx)-y-(x)]dx 
n  0  0 

Let     d^(x)  =  suP||^„^^_l^l^^|S^(x+A)|      and     d^(x)  =  sup,,^,,^^  ,^  I^J  5\(x+A)  | .      It 

follows  as  in  the  proof  of  Theorem  3.2  that     lly{x)-i'(x)ll   :£  Cd   (x)h  ,      \\vix)-v[x)\\   ^ 

Cd   (x)h  ,     and      \\-y{x)--yAx)[\   s  Cd   (x)h  .     Therefore,      |E[B   ]|    =  0(.Vnh       )     and     Var(B   )  £ 
V  0  If  n  n 

E[llD(x)-i^(x)ll^llwll^]  =  0(h^^),     so     B     =  0  (Vnh^"^^  +  h^)     holds  by  the  Markov  inequality, 

giving  the  first  conclusion.     The  second  conclusion  follows  by     v(x]  =  0,     and  hence     B 

~  t+s  t 

=  0,     and  the  fact  that  the  twicing  kernel  was  only  used  to  show     B     =0  (v^nh        +  h  ). 

QED. 


Proof  of  Theorem  5.2:      It  follows  by  Lemma  6  that     J^._  m(z.,y)/v/n  — ^  N(0,Var(^//(z))). 
Also,  for     M  =  X;.^^3m(z..pQ,rQ)/5p/n,      IIM-MII   s  [^^^bCzJ/nKllp-pQll^  +  llf-yQ"^)  -^  0, 
so     M  — ^  M     follows  by  the  triangle  inequality  and  Khintchine's  law  of  large  numbers. 
Then  by  the  continuous  mapping  theorem,     M      — ^  M    ,     so  the  conclusion  follows  by 
Slutzky's  theorem.     QED. 
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